How to train a Japanese model with Sentence transformer to get a distributed representation of a sentence

. BERT is a model that can be powerfully applied to natural language processing tasks. However, it does not do a good job of capturing sentence-wise features. Some claim that sentence features appear in [ CLS\ ], but This paper](https://arxiv.org/abs/1908.10084) claims that it does not contain that much useful information for the task. Sentence BERT is a model that extends BERT to be able to obtain features per sentence. The following are the steps to create Sentence BERT in Japanese. [Read More]

Enumerating Applications of Document Classification Problems Only

Applying the Document Classification Problem You’ve learned about machine learning, but you don’t know how to use it! Isn’t it? It is easy to overlook this if you don’t pay attention to it when you study it, but if you don’t keep your antennas up, you won’t know how to use it. If you don’t keep your antennae up, you won’t know how to use it. Since a tool is only a tool if it is used, you should make a note of how you use your newly acquired tool. [Read More]

A note on how to use BERT learned from Japanese Wikipedia, now available

huggingface has released a Japanese model for BERT. The Japanese model is included in transformers. However, I stumbled over a few things before I could get it to actually work in a Mac environment, so I’ll leave a note. Preliminaries: Installing mecab The morphological analysis engine, mecab, is required to use BERT’s Japanese model. The tokenizer will probably ask for mecab. This time, we will use homebrew to install Mecab [Read More]

How to use NeuralClassifier, a library that provides a crazy number of models for document classification problems

[! [](https://1.bp.blogspot.com/-YlMb8v77MN4/XurdQSzS1yI/AAAAAAAAg6Y/oSZrJ0c9yxYbzQnNNTynRvZnEp-xGE7NwCK4BGAsYHg/s320/AFE90C8A-A49C- 4475-9F05-50E2D56D5B63.jpeg)](https://1.bp.blogspot.com/-YlMb8v77MN4/XurdQSzS1yI/AAAAAAAAg6Y/oSZrJ0c9yxYbzQnNNTynRvZnEp-xGE7NwCK4 BGAsYHg/s1920/AFE90C8A-A49C-4475-9F05-50E2D56D5B63.jpeg) NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit is a python library for multi-label document classification problems published by Tencent. For more information, see [NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit](https://github.com/Tencent/NeuralNLP- NeuralClassifier) NeuralClassifier is designed for quick implementation of neural models for hierarchical multi-label classification task, which is more challenging and common in real-world scenarios. for more details. NeuralClassifier is designed for quick implementation of neural models for hierarchical multi-label classification task, which is more challenging and common in real-world scenarios. [Read More]

I even did a document classification problem with Fasttext

Summary of what I’ve done with Fasttext to the document classification problem. Facebook research has published a document classification library using Fasttext. Fasttext is easy to install in a python environment. Run time is fast. Preliminaries I decided to tackle the task of document classification, and initially thought. NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit. However, it was not [Read More]