Procedure for obtaining a distributed representation of a Japanese sentence using a trained Universal Sentence Encoder

.

A vector of documents can be obtained using Universal Sentence Encoder.

Features

Supports multiple languages.

Japanese is supported.

Can handle Japanese sentences as vectors.

Usage

Clustering, similarity calculation, feature extraction.

Usage

Execute the following command as preparation.

pip install tensorflow tensorflow_hub tensorflow_text numpy   

Trained models are available.

See the python description below for details on how to use it.

import tensorflow_hub as hub  
import tensorflow_text
import numpy as np  
# for avoiding error  
import ssl  
ssl._create_default_https_context = ssl._create_unverified_context  

def cos_sim(v1, v2):  
   return np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))  
  
embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual/3")  
  
texts = ["I saw a comedy show yesterday." , "There was a comedy show on TV last night." , "I went to the park yesterday." , "I saw a comedy show last night.", "Yesterday, I went to the park."]  
vectors = embed(texts)  
```.

See the following link for more details

[Try Universal Sentence Encoder in Japanese](https://qiita.com/kenta1984/items/9613da23766a2578a27a)


### Postscript
```py
import tensorflow_text

Without this line, you will get an error like ``Sentencepiece not found! error. This line is not explicitly used in the sample source, but is required for the actual execution. This line is not explicitly used in the sample source, but is required in the actual runtime.

[Read More]

Enumerating Applications of Document Classification Problems Only

Applying the Document Classification Problem

You’ve learned about machine learning, but you don’t know how to use it! Isn’t it?

It is easy to overlook this if you don’t pay attention to it when you study it, but if you don’t keep your antennas up, you won’t know how to use it.

If you don’t keep your antennae up, you won’t know how to use it. Since a tool is only a tool if it is used, you should make a note of how you use your newly acquired tool.

[Read More]

物理数学の直観的方法を読んだ感想

数学は暗記だ、と思っている人居ますよね?

直観で理解できたら簡単なのにって思いませんか?

物理数学は難しい

物理数学には挫折するポイントが幾つもある。

大学で数学の講義を受けて、ヒイコラ言いながら単位を取る。理解など到底おぼつかない。まして直観的な方法だって?

[Read More]

word2vecのアルゴリズムを把握するためにnotebookで動かしながら挙動を理解しよう

word2vecを理解しよう!

  • word2vec のアルゴリズムについて、勉強しようとして苦戦していませんか?
    • アルゴリズムの基になる発想は意外に直観的なものですが、その直観をアルゴリズムの記述から読み取るのはコツが要るかもしれません。
    • 実際に動くモデルで遊んでみて、反応をみながら感覚を掴むといいと思います。
    • 一行単位で実行できるプログラムを自分の手で動かしながら、出力を確認できると分かりやすいと思いませんか?

環境構築不要!

  • そこでGoogle Colaboratory というサービスを利用して、手軽にword2vecを動かして、アルゴリズムの仕組みを理解しましょう!
    • Google Colaboratory はGoogleが提供しているサービスです。
    • Gmailのアカウントを持っていれば環境構築の手間が省け、Googleの計算資源を利用できるものです。
  • そこでword2vecを動かせるプログラムを用意しました。
  • このプログラムは技術書典というイベントで頒布させていただき、50以上の方に利用していただきました。

購入は以下のリンクから

A note on how to use BERT learned from Japanese Wikipedia, now available

huggingface has released a Japanese model for BERT.

The Japanese model is included in transformers.

However, I stumbled over a few things before I could get it to actually work in a Mac environment, so I’ll leave a note.

Preliminaries: Installing mecab

The morphological analysis engine, mecab, is required to use BERT’s Japanese model.

The tokenizer will probably ask for mecab.

This time, we will use homebrew to install Mecab and ipadic.

[Read More]

Using Google colaboratory to learn how Word2Vec works and models

**

Let’s start with Word2Vec.**

word2vec is a model that can learn the semantic vectors of words from unlabeled text are.

Dealing with word vectors allows for applications such as word similarity calculation and clustering, and BERT. And BERT, an extension of that technology, is also used in Google’s search service.

It’s hard to understand the concept.

But are you struggling to learn about word2vec? It’s hard to understand the concept because it’s not something you’re familiar with.

[Read More]

How to use NeuralClassifier, a library that provides a crazy number of models for document classification problems

[! [](https://1.bp.blogspot.com/-YlMb8v77MN4/XurdQSzS1yI/AAAAAAAAg6Y/oSZrJ0c9yxYbzQnNNTynRvZnEp-xGE7NwCK4BGAsYHg/s320/AFE90C8A-A49C- 4475-9F05-50E2D56D5B63.jpeg)](https://1.bp.blogspot.com/-YlMb8v77MN4/XurdQSzS1yI/AAAAAAAAg6Y/oSZrJ0c9yxYbzQnNNTynRvZnEp-xGE7NwCK4 BGAsYHg/s1920/AFE90C8A-A49C-4475-9F05-50E2D56D5B63.jpeg)

NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit is a python library for multi-label document classification problems published by Tencent.

For more information, see

[NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit](https://github.com/Tencent/NeuralNLP- NeuralClassifier) NeuralClassifier is designed for quick implementation of neural models for hierarchical multi-label classification task, which is more challenging and common in real-world scenarios.

for more details.

NeuralClassifier is designed for quick implementation of neural models for hierarchical multi-label classification task, which is more challenging and common in real-world scenarios.

[Read More]

数理最適化の社会実装と研究について

社会実装と研究

社会データ実装は一筋縄ではいかない!?~AIによる海上保安:船舶モニタリングを行うGeoTrackNetとは~

最先端が最善ではない

最先端の手法を使えばいい結果が得られると思いがちです。

しかし、実世界の問題ではデータがきちんと整備されていることは少ない。

また取得したデータを加工したとして、そのデータにアルゴリズムを適用しても狙った結果が得られないこともある。

[Read More]