Posted on Thu Feb 4 2021
| 2 minutes
| 410 words
|
After the distributed representation has been obtained, the
After the distributed representation is obtained, machine learning can be used to classify it.
Models that can be used include
Decision Tree
SVM Support Vector Machine
NN Neural Networks
and others.
SVM is included in NN in a broad sense.
In this section, we will use the decision tree method.
Bagging
Image of majority voting with multiple decision trees
Simple theory
Decision trees are highly explainable and are a classic machine learning model.
Computational load is light compared to deep learning
Depends on the size of the model
Not much explainability
Do we want to analyze each of the multiple decision trees?
``py
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
Posted on Wed Feb 3 2021
| 3 minutes
| 508 words
|
.
BERT is a model that can be powerfully applied to natural language processing tasks.
However, it does not do a good job of capturing sentence-wise features.
Some claim that sentence features appear in [ CLS\ ], but
This paper](https://arxiv.org/abs/1908.10084) claims that it does not contain that much useful information for the task.
Sentence BERT is a model that extends BERT to be able to obtain features per sentence.
The following are the steps to create Sentence BERT in Japanese.
Posted on Tue Jan 19 2021
| 2 minutes
| 345 words
|
BART is a model for document summarization
Derived from the same transformer as BERT
Unlike BERT, it has an encoder-decoder structure
This is because it is intended for sentence generation
This page shows the steps to run a tutorial on BART.
Procedure
install transformers
Run ``sh
pip install transformers
Run summary
2. Run the summary
from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig
model = BartForConditionalGeneration.from_pretrained('facebook/bart-large')
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')
ARTICLE_TO_SUMMARIZE = "My friends are cool but they eat too many carbs."inputs = tokenizer([ARTICLE_TO_SUMMARIZE], max_length=1024, return_tensors='pt')
# Generate Summarysummary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=5, early_stopping=True)
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])
```
On 2021/01/18, the output was MyMy friends.
Interesting.
## Where I got stuck.Error when the version of pytorch is different from the one specified in transformers.
See the python description below for details on how to use it.
importtensorflow_hubashubimporttensorflow_textimportnumpyasnp# for avoiding error importsslssl._create_default_https_context=ssl._create_unverified_contextdefcos_sim(v1,v2):returnnp.dot(v1,v2)/(np.linalg.norm(v1)*np.linalg.norm(v2))embed=hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual/3")texts=["I saw a comedy show yesterday.","There was a comedy show on TV last night.","I went to the park yesterday.","I saw a comedy show last night.","Yesterday, I went to the park."]vectors=embed(texts)```.Seethefollowinglinkformoredetails[TryUniversalSentenceEncoderinJapanese](https://qiita.com/kenta1984/items/9613da23766a2578a27a)### Postscript```pyimporttensorflow_text
Without this line, you will get an error like ``Sentencepiece not found! error.
This line is not explicitly used in the sample source, but is required for the actual execution.
This line is not explicitly used in the sample source, but is required in the actual runtime.