How to train a Japanese model with Sentence transformer to get a distributed representation of a sentence

. BERT is a model that can be powerfully applied to natural language processing tasks. However, it does not do a good job of capturing sentence-wise features. Some claim that sentence features appear in [ CLS\ ], but This paper](https://arxiv.org/abs/1908.10084) claims that it does not contain that much useful information for the task. Sentence BERT is a model that extends BERT to be able to obtain features per sentence. The following are the steps to create Sentence BERT in Japanese. [Read More]

A note on how to use BERT learned from Japanese Wikipedia, now available

huggingface has released a Japanese model for BERT. The Japanese model is included in transformers. However, I stumbled over a few things before I could get it to actually work in a Mac environment, so I’ll leave a note. Preliminaries: Installing mecab The morphological analysis engine, mecab, is required to use BERT’s Japanese model. The tokenizer will probably ask for mecab. This time, we will use homebrew to install Mecab [Read More]