Technology -

Creating data in Natural Language Inference (NLI) format for Sentence transformer

Posted on Wed Feb 17 2021 | 2 minutes | 376 words |

Using the Sentence Transformer to I’m trying to use Sentence Transformer to infer causal relationships between documents.

If we can do this, we can extract the cause and symptoms of the incident from the report.

So, I wondered if NLI could be used for feature learning to extract causal information. I thought.

What is NLI?

Inference of the relationship between two sentences

Forward
Inverse
Unrelated

The three relations are.

Apply to causal relationships

If we apply the three relationships of NLI to causality, the following patterns are possible.

[Read More]

On the use of distributed representations bagging for class classification and generalization performance

Posted on Thu Feb 4 2021 | 2 minutes | 410 words |

After the distributed representation has been obtained, the After the distributed representation is obtained, machine learning can be used to classify it.

Models that can be used include

Decision Tree
SVM Support Vector Machine
NN Neural Networks

and others.

SVM is included in NN in a broad sense.

In this section, we will use the decision tree method.

Bagging

Image of majority voting with multiple decision trees
Simple theory
- Decision trees are highly explainable and are a classic machine learning model.
- Computational load is light compared to deep learning
  - Depends on the size of the model
Not much explainability
- Do we want to analyze each of the multiple decision trees?

``py from sklearn.ensemble import BaggingClassifier from sklearn.tree import DecisionTreeClassifier

[Read More]

distributed representation engineering machine learning generalization performance technology

How to train a Japanese model with Sentence transformer to get a distributed representation of a sentence

Posted on Wed Feb 3 2021 | 3 minutes | 508 words |

. BERT is a model that can be powerfully applied to natural language processing tasks.

However, it does not do a good job of capturing sentence-wise features.

Some claim that sentence features appear in [ CLS\ ], but This paper](https://arxiv.org/abs/1908.10084) claims that it does not contain that much useful information for the task.

Sentence BERT is a model that extends BERT to be able to obtain features per sentence.

The following are the steps to create Sentence BERT in Japanese.

[Read More]

technical natural language processing BERT distributed representation technology sentence transformer

Using BART (sentence summary model) with hugging face

Posted on Tue Jan 19 2021 | 2 minutes | 345 words |

BART is a model for document summarization
Derived from the same transformer as BERT
Unlike BERT, it has an encoder-decoder structure
- This is because it is intended for sentence generation

This page shows the steps to run a tutorial on BART.

Procedure

install transformers

Run ``sh pip install transformers

Run summary

2. Run the summary
from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig

model = BartForConditionalGeneration.from_pretrained('facebook/bart-large')
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')

ARTICLE_TO_SUMMARIZE = "My friends are cool but they eat too many carbs."
inputs = tokenizer([ARTICLE_TO_SUMMARIZE], max_length=1024, return_tensors='pt')

# Generate Summary
summary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=5, early_stopping=True)
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])
```

On 2021/01/18, the output was MyMy friends.

Interesting.

## Where I got stuck.
Error when the version of pytorch is different from the one specified in transformers.

pip install -U torch

[Read More]

engineering natural language processing python technology distributed representation sentence generation

Procedure for obtaining a distributed representation of a Japanese sentence using a trained Universal Sentence Encoder

Posted on Mon Jun 22 2020 | 2 minutes | 405 words |

.

A vector of documents can be obtained using Universal Sentence Encoder.

Features

Supports multiple languages.

Japanese is supported.

Can handle Japanese sentences as vectors.

Usage

Clustering, similarity calculation, feature extraction.

Usage

Execute the following command as preparation.

pip install tensorflow tensorflow_hub tensorflow_text numpy

Trained models are available.

See the python description below for details on how to use it.

import tensorflow_hub as hub  
import tensorflow_text
import numpy as np  
# for avoiding error  
import ssl  
ssl._create_default_https_context = ssl._create_unverified_context  

def cos_sim(v1, v2):  
   return np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))  
  
embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual/3")  
  
texts = ["I saw a comedy show yesterday." , "There was a comedy show on TV last night." , "I went to the park yesterday." , "I saw a comedy show last night.", "Yesterday, I went to the park."]  
vectors = embed(texts)  
```.

See the following link for more details

[Try Universal Sentence Encoder in Japanese](https://qiita.com/kenta1984/items/9613da23766a2578a27a)


### Postscript
```py
import tensorflow_text

Without this line, you will get an error like ``Sentencepiece not found! error. This line is not explicitly used in the sample source, but is required for the actual execution. This line is not explicitly used in the sample source, but is required in the actual runtime.

[Read More]

technology natural language processing universal sentence encoder technology python distributed representation

Enumerating Applications of Document Classification Problems Only

Posted on Thu Jun 18 2020 | 3 minutes | 550 words |

Applying the Document Classification Problem

You’ve learned about machine learning, but you don’t know how to use it! Isn’t it?

It is easy to overlook this if you don’t pay attention to it when you study it, but if you don’t keep your antennas up, you won’t know how to use it.

If you don’t keep your antennae up, you won’t know how to use it. Since a tool is only a tool if it is used, you should make a note of how you use your newly acquired tool.

[Read More]

technical natural language processing technology document classification

A note on how to use BERT learned from Japanese Wikipedia, now available

Posted on Wed Jun 17 2020 | 1 minutes | 472 words |

huggingface has released a Japanese model for BERT.

The Japanese model is included in transformers.

However, I stumbled over a few things before I could get it to actually work in a Mac environment, so I’ll leave a note.

Preliminaries: Installing mecab

The morphological analysis engine, mecab, is required to use BERT’s Japanese model.

The tokenizer will probably ask for mecab.

This time, we will use homebrew to install Mecab and ipadic.

[Read More]

technical natural language processing bert technology python distributed representation

How to use NeuralClassifier, a library that provides a crazy number of models for document classification problems

Posted on Mon Jun 15 2020 | 3 minutes | 537 words |

[! [](https://1.bp.blogspot.com/-YlMb8v77MN4/XurdQSzS1yI/AAAAAAAAg6Y/oSZrJ0c9yxYbzQnNNTynRvZnEp-xGE7NwCK4BGAsYHg/s320/AFE90C8A-A49C- 4475-9F05-50E2D56D5B63.jpeg)](https://1.bp.blogspot.com/-YlMb8v77MN4/XurdQSzS1yI/AAAAAAAAg6Y/oSZrJ0c9yxYbzQnNNTynRvZnEp-xGE7NwCK4 BGAsYHg/s1920/AFE90C8A-A49C-4475-9F05-50E2D56D5B63.jpeg)

NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit is a python library for multi-label document classification problems published by Tencent.

For more information, see

[NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit](https://github.com/Tencent/NeuralNLP- NeuralClassifier) NeuralClassifier is designed for quick implementation of neural models for hierarchical multi-label classification task, which is more challenging and common in real-world scenarios.

for more details.

NeuralClassifier is designed for quick implementation of neural models for hierarchical multi-label classification task, which is more challenging and common in real-world scenarios.

[Read More]

technical natural language processing neural classifier technology python fasttext

I even did a document classification problem with Fasttext

Posted on Sat Jun 13 2020 | 2 minutes | 801 words |

Summary of what I’ve done with Fasttext to the document classification problem.

Facebook research has published a document classification library using Fasttext.
Fasttext is easy to install in a python environment.
Run time is fast.

Preliminaries

I decided to tackle the task of document classification, and initially thought.

NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit

NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit. However, it was not very accurate.

[Read More]

technical natural language processing fasttext technology distributed representation document classification

I made a summary text generation AI for making short-form news

Posted on Sat Jun 13 2020 | 6 minutes | 1078 words |

.

We have successfully trained a model to automatically generate titles from news texts using a machine translation model based on deep learning.

Preliminaries

In the past, I was involved in a project to automatically generate titles from manuscripts for online news. In the past, I was involved in a project to automatically generate titles from manuscripts for online news.

In order to tackle this project, I was looking into existing methods.

[Read More]

gpu engineering sentence shortening technology sentence generation deep learning natural language processing