Export
因果言語モデルを一から学習 (TensorFlow)
Install the Transformers, Datasets, and Evaluate libraries to run this notebook.
[ ]
You will need to setup git, adapt your email and name in the following cell.
[ ]
You will also need to be logged in to the Hugging Face Hub. Execute the following and enter your credentials.
[ ]
[ ]
[ ]
False True
[ ]
[ ]
3.26% of data after filtering.
[ ]
DatasetDict({
, train: Dataset({
, features: ['repo_name', 'path', 'copies', 'size', 'content', 'license'],
, num_rows: 606720
, })
, valid: Dataset({
, features: ['repo_name', 'path', 'copies', 'size', 'content', 'license'],
, num_rows: 3322
, })
,}) [ ]
'REPO_NAME: kmike/scikit-learn' ,'PATH: sklearn/utils/__init__.py' ,'COPIES: 3' ,'SIZE: 10094' ,'''CONTENT: """ ,The :mod:`sklearn.utils` module includes various utilites. ,""" , ,from collections import Sequence , ,import numpy as np ,from scipy.sparse import issparse ,import warnings , ,from .murmurhash import murm ,LICENSE: bsd-3-clause'''
[ ]
Input IDs length: 34 ,Input chunk lengths: [128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 117, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 41] ,Chunk mapping: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[ ]
DatasetDict({
, train: Dataset({
, features: ['input_ids'],
, num_rows: 16702061
, })
, valid: Dataset({
, features: ['input_ids'],
, num_rows: 93164
, })
,}) [ ]
[ ]
_________________________________________________________________ ,Layer (type) Output Shape Param # ,================================================================= ,transformer (TFGPT2MainLayer multiple 124242432 ,================================================================= ,Total params: 124,242,432 ,Trainable params: 124,242,432 ,Non-trainable params: 0 ,_________________________________________________________________
[ ]
[ ]
input_ids shape: (5, 128) ,attention_mask shape: (5, 128) ,labels shape: (5, 128)
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
# create some data ,x = np.random.randn(100) ,y = np.random.randn(100) , ,# create scatter plot with x, y ,plt.scatter(x, y) , ,# create scatter
[ ]
# create some data
,x = np.random.randn(100)
,y = np.random.randn(100)
,
,# create dataframe from x and y
,df = pd.DataFrame({'x': x, 'y': y})
,df.insert(0,'x', x)
,for [ ]
# dataframe with profession, income and name
,df = pd.DataFrame({'profession': x, 'income':y, 'name': z})
,
,# calculate the mean income per profession
,profession = df.groupby(['profession']).mean()
,
,# compute the [ ]
# import random forest regressor from scikit-learn ,from sklearn.ensemble import RandomForestRegressor , ,# fit random forest model with 300 estimators on X, y: ,rf = RandomForestRegressor(n_estimators=300, random_state=random_state, max_depth=3) ,rf.fit(X, y) ,rf