- scikit-learnと似ている?
- gpu使える
参考リンク
関連書籍
This page shows the steps to run a tutorial on BART.
Run ``sh pip install transformers
Run summary
2. Run the summary
from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig
model = BartForConditionalGeneration.from_pretrained('facebook/bart-large')
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')
ARTICLE_TO_SUMMARIZE = "My friends are cool but they eat too many carbs."
inputs = tokenizer([ARTICLE_TO_SUMMARIZE], max_length=1024, return_tensors='pt')
# Generate Summary
summary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=5, early_stopping=True)
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])
```
On 2021/01/18, the output was MyMy friends.
Interesting.
## Where I got stuck.
Error when the version of pytorch is different from the one specified in transformers.
pip install -U torch
[Read More]実験結果を比較するために便利っぽいのでMLflowを使ってみた。
パラメータと実験結果の記録をある程度自動化できる。
機械学習の実践はある種の黒魔術となることが多いので再現性を担保するための努力は後々に影響する。
[Read More]問題解決の思考はまだ人間がしないといけないよ。
機械学習、AI、深層学習がすごいのは事実としてそうだと思うけど、線形計画や統計みたいな素朴な方法が全て取って代わられるかというと、そうではないよね。
[Read More].
A vector of documents can be obtained using Universal Sentence Encoder.
Supports multiple languages.
Japanese is supported.
Can handle Japanese sentences as vectors.
Clustering, similarity calculation, feature extraction.
Execute the following command as preparation.
pip install tensorflow tensorflow_hub tensorflow_text numpy
Trained models are available.
See the python description below for details on how to use it.
import tensorflow_hub as hub
import tensorflow_text
import numpy as np
# for avoiding error
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
def cos_sim(v1, v2):
return np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))
embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual/3")
texts = ["I saw a comedy show yesterday." , "There was a comedy show on TV last night." , "I went to the park yesterday." , "I saw a comedy show last night.", "Yesterday, I went to the park."]
vectors = embed(texts)
```.
See the following link for more details
[Try Universal Sentence Encoder in Japanese](https://qiita.com/kenta1984/items/9613da23766a2578a27a)
### Postscript
```py
import tensorflow_text
Without this line, you will get an error like ``Sentencepiece not found! error. This line is not explicitly used in the sample source, but is required for the actual execution. This line is not explicitly used in the sample source, but is required in the actual runtime.
[Read More]この記事はコマンドラインをある程度利用できる方に向けて書いています。
コマンドラインがなにか分からないけど、pythonを使いたい方はGoogle Colaboratory というサービスの利用を検討してください。
[Read More]You’ve learned about machine learning, but you don’t know how to use it! Isn’t it?
It is easy to overlook this if you don’t pay attention to it when you study it, but if you don’t keep your antennas up, you won’t know how to use it.
If you don’t keep your antennae up, you won’t know how to use it. Since a tool is only a tool if it is used, you should make a note of how you use your newly acquired tool.
[Read More]関連書籍
[Read More]huggingface has released a Japanese model for BERT.
The Japanese model is included in transformers.
However, I stumbled over a few things before I could get it to actually work in a Mac environment, so I’ll leave a note.
The morphological analysis engine, mecab, is required to use BERT’s Japanese model.
The tokenizer will probably ask for mecab.
This time, we will use homebrew to install Mecab and ipadic.
[Read More]