Sentence Transformers -

日本語の分散表現の計算方法まとめ

Posted on Wed Mar 2 2022 | 2 min | 647 words |

単語単位の分散表現

Word2vec
- 自然言語処理における分散表現の一つのオリジナル
- 基本原理くらいは知っていてもいいかもしれない
- gensimがよく使われる
Fasttext で文書分類問題までやったった
- fastと名前がついているだけあってfacebookが公開しているモデルは高速に動作する
- 分散表現とクラス分類に対応していたり、利便性が高い
- 特にこのモデルで利用されている分かち書きの特徴から未知語に強いとされている
日本語Wikipediaで学習済みのBERTが公開されているので使い方メモ
- Google の検索エンジンにも採用されている、らしい
- 自然言語処理の研究を大きく変えたモデル
- 関連する技術であるTransformerは自然言語処理だけでなく、画像処理の界隈にも流用された
- huggingfaceで日本語版のBERTも色々と公開されている
日本語に対応したT5
- この日本語版のモデルの作者が公開しているサンプルがわかりやすい
- また同じ作者がSBERTのモデルも公開している

機械学習の基礎からモデルの仕組みまで体系的に学びたい方へ

[Read More]

自然言語処理 T5 BERT Sentence Transformers SBERT Word2Vec fasttext 機械学習 Python 技術

Creating data in Natural Language Inference (NLI) format for Sentence transformer

Posted on Wed Feb 17 2021 | 2 min | 376 words |

Using the Sentence Transformer to I’m trying to use Sentence Transformer to infer causal relationships between documents.

If we can do this, we can extract the cause and symptoms of the incident from the report.

So, I wondered if NLI could be used for feature learning to extract causal information. I thought.

What is NLI?

Inference of the relationship between two sentences

Forward
Inverse
Unrelated

The three relations are.

Apply to causal relationships

If we apply the three relationships of NLI to causality, the following patterns are possible.

[Read More]

NLI Sentence Transformers technology natural language processing document classification machine learning Python

How to train a Japanese model with Sentence transformer to get a distributed representation of a sentence

Posted on Wed Feb 3 2021 | 3 min | 508 words |

. BERT is a model that can be powerfully applied to natural language processing tasks.

However, it does not do a good job of capturing sentence-wise features.

Some claim that sentence features appear in [ CLS\ ], but This paper](https://arxiv.org/abs/1908.10084) claims that it does not contain that much useful information for the task.

Sentence BERT is a model that extends BERT to be able to obtain features per sentence.

The following are the steps to create Sentence BERT in Japanese.

[Read More]

technology natural language processing BERT distributed representation Sentence Transformers machine learning Python