Posted on Thu Feb 4 2021
| 2 minutes
| 410 words
|
After the distributed representation has been obtained, the
After the distributed representation is obtained, machine learning can be used to classify it.
Models that can be used include
Decision Tree
SVM Support Vector Machine
NN Neural Networks
and others.
SVM is included in NN in a broad sense.
In this section, we will use the decision tree method.
Bagging
Image of majority voting with multiple decision trees
Simple theory
Decision trees are highly explainable and are a classic machine learning model.
Computational load is light compared to deep learning
Depends on the size of the model
Not much explainability
Do we want to analyze each of the multiple decision trees?
``py
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
Posted on Wed Feb 3 2021
| 3 minutes
| 508 words
|
.
BERT is a model that can be powerfully applied to natural language processing tasks.
However, it does not do a good job of capturing sentence-wise features.
Some claim that sentence features appear in [ CLS\ ], but
This paper](https://arxiv.org/abs/1908.10084) claims that it does not contain that much useful information for the task.
Sentence BERT is a model that extends BERT to be able to obtain features per sentence.
The following are the steps to create Sentence BERT in Japanese.
Posted on Tue Jan 19 2021
| 2 minutes
| 345 words
|
BART is a model for document summarization
Derived from the same transformer as BERT
Unlike BERT, it has an encoder-decoder structure
This is because it is intended for sentence generation
This page shows the steps to run a tutorial on BART.
Procedure
install transformers
Run ``sh
pip install transformers
Run summary
2. Run the summary
from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig
model = BartForConditionalGeneration.from_pretrained('facebook/bart-large')
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')
ARTICLE_TO_SUMMARIZE = "My friends are cool but they eat too many carbs."inputs = tokenizer([ARTICLE_TO_SUMMARIZE], max_length=1024, return_tensors='pt')
# Generate Summarysummary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=5, early_stopping=True)
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])
```
On 2021/01/18, the output was MyMy friends.
Interesting.
## Where I got stuck.Error when the version of pytorch is different from the one specified in transformers.
See the python description below for details on how to use it.
importtensorflow_hubashubimporttensorflow_textimportnumpyasnp# for avoiding error importsslssl._create_default_https_context=ssl._create_unverified_contextdefcos_sim(v1,v2):returnnp.dot(v1,v2)/(np.linalg.norm(v1)*np.linalg.norm(v2))embed=hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual/3")texts=["I saw a comedy show yesterday.","There was a comedy show on TV last night.","I went to the park yesterday.","I saw a comedy show last night.","Yesterday, I went to the park."]vectors=embed(texts)```.Seethefollowinglinkformoredetails[TryUniversalSentenceEncoderinJapanese](https://qiita.com/kenta1984/items/9613da23766a2578a27a)### Postscript```pyimporttensorflow_text
Without this line, you will get an error like ``Sentencepiece not found! error.
This line is not explicitly used in the sample source, but is required for the actual execution.
This line is not explicitly used in the sample source, but is required in the actual runtime.
Posted on Thu Jun 18 2020
| 3 minutes
| 550 words
|
Applying the Document Classification Problem
You’ve learned about machine learning, but you don’t know how to use it! Isn’t it?
It is easy to overlook this if you don’t pay attention to it when you study it, but if you don’t keep your antennas up, you won’t know how to use it.
If you don’t keep your antennae up, you won’t know how to use it. Since a tool is only a tool if it is used, you should make a note of how you use your newly acquired tool.
NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit is a python library for multi-label document classification problems published by Tencent.
For more information, see
[NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit](https://github.com/Tencent/NeuralNLP- NeuralClassifier)
NeuralClassifier is designed for quick implementation of neural models for hierarchical multi-label classification task, which is more challenging and common in real-world scenarios.
for more details.
NeuralClassifier is designed for quick implementation of neural models for hierarchical multi-label classification task, which is more challenging and common in real-world scenarios.
Posted on Sat Jun 13 2020
| 6 minutes
| 1078 words
|
.
We have successfully trained a model to automatically generate titles from news texts using a machine translation model based on deep learning.
Preliminaries
In the past, I was involved in a project to automatically generate titles from manuscripts for online news. In the past, I was involved in a project to automatically generate titles from manuscripts for online news.
In order to tackle this project, I was looking into existing methods.