distributed representation -

Why is fasttext so fast?

Posted on Wed May 19 2021 | 1 minutes | 156 words | 近藤綾乃

Features of fasttext Improved objective function Consideration of negative samples This should not affect training time. Change in optimization method Use of stochastic optimization If it affects the learning time, it should be this one Implementation in C language This is the most effective, isn’t it? If we implement it in ## pytorch, it won’t be much different from word2vec. It would depend on the amount of data to be trained. [Read More]

natural language processing distributed representation

On the use of distributed representations bagging for class classification and generalization performance

Posted on Thu Feb 4 2021 | 2 minutes | 410 words | 近藤綾乃

After the distributed representation has been obtained, the After the distributed representation is obtained, machine learning can be used to classify it. Models that can be used include Decision Tree SVM Support Vector Machine NN Neural Networks and others. SVM is included in NN in a broad sense. In this section, we will use the decision tree method. Bagging Image of majority voting with multiple decision trees Simple theory Decision trees are highly explainable and are a classic machine learning model. [Read More]

distributed representation engineering machine learning generalization performance technology

How to train a Japanese model with Sentence transformer to get a distributed representation of a sentence

Posted on Wed Feb 3 2021 | 3 minutes | 508 words | 近藤綾乃

. BERT is a model that can be powerfully applied to natural language processing tasks. However, it does not do a good job of capturing sentence-wise features. Some claim that sentence features appear in [ CLS\ ], but This paper](https://arxiv.org/abs/1908.10084) claims that it does not contain that much useful information for the task. Sentence BERT is a model that extends BERT to be able to obtain features per sentence. The following are the steps to create Sentence BERT in Japanese. [Read More]

technical natural language processing BERT distributed representation technology sentence transformer

Using BART (sentence summary model) with hugging face

Posted on Tue Jan 19 2021 | 2 minutes | 345 words | 近藤綾乃

BART is a model for document summarization Derived from the same transformer as BERT Unlike BERT, it has an encoder-decoder structure This is because it is intended for sentence generation This page shows the steps to run a tutorial on BART. Procedure install transformers Run ``sh pip install transformers Run summary 2. Run the summary from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig model = BartForConditionalGeneration.from_pretrained('facebook/bart-large') tokenizer = BartTokenizer.from_pretrained('facebook/bart-large') ARTICLE_TO_SUMMARIZE = "My friends are cool but they eat too many carbs. [Read More]

engineering natural language processing python technology distributed representation sentence generation

Procedure for obtaining a distributed representation of a Japanese sentence using a trained Universal Sentence Encoder

Posted on Mon Jun 22 2020 | 2 minutes | 405 words | 近藤綾乃

. A vector of documents can be obtained using Universal Sentence Encoder. Features Supports multiple languages. Japanese is supported. Can handle Japanese sentences as vectors. Usage Clustering, similarity calculation, feature extraction. Usage Execute the following command as preparation. pip install tensorflow tensorflow_hub tensorflow_text numpy Trained models are available. See the python description below for details on how to use it. import tensorflow_hub as hub import tensorflow_text import numpy as np # for avoiding error import ssl ssl. [Read More]

technology natural language processing universal sentence encoder technology python distributed representation

A note on how to use BERT learned from Japanese Wikipedia, now available

Posted on Wed Jun 17 2020 | 1 minutes | 472 words | 近藤綾乃

huggingface has released a Japanese model for BERT. The Japanese model is included in transformers. However, I stumbled over a few things before I could get it to actually work in a Mac environment, so I’ll leave a note. Preliminaries: Installing mecab The morphological analysis engine, mecab, is required to use BERT’s Japanese model. The tokenizer will probably ask for mecab. This time, we will use homebrew to install Mecab [Read More]

technical natural language processing bert technology python distributed representation

I even did a document classification problem with Fasttext

Posted on Sat Jun 13 2020 | 2 minutes | 801 words | 近藤綾乃

Summary of what I’ve done with Fasttext to the document classification problem. Facebook research has published a document classification library using Fasttext. Fasttext is easy to install in a python environment. Run time is fast. Preliminaries I decided to tackle the task of document classification, and initially thought. NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit. However, it was not [Read More]

technical natural language processing fasttext technology distributed representation document classification