Procedure for obtaining a distributed representation of a Japanese sentence using a trained Universal Sentence Encoder

.

A vector of documents can be obtained using Universal Sentence Encoder.

Features

Supports multiple languages.

Japanese is supported.

Can handle Japanese sentences as vectors.

Usage

Clustering, similarity calculation, feature extraction.

Usage

Execute the following command as preparation.

pip install tensorflow tensorflow_hub tensorflow_text numpy   

Trained models are available.

See the python description below for details on how to use it.

import tensorflow_hub as hub  
import tensorflow_text
import numpy as np  
# for avoiding error  
import ssl  
ssl._create_default_https_context = ssl._create_unverified_context  

def cos_sim(v1, v2):  
   return np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))  
  
embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual/3")  
  
texts = ["I saw a comedy show yesterday." , "There was a comedy show on TV last night." , "I went to the park yesterday." , "I saw a comedy show last night.", "Yesterday, I went to the park."]  
vectors = embed(texts)  
```.

See the following link for more details

[Try Universal Sentence Encoder in Japanese](https://qiita.com/kenta1984/items/9613da23766a2578a27a)


### Postscript
```py
import tensorflow_text

Without this line, you will get an error like ``Sentencepiece not found! error. This line is not explicitly used in the sample source, but is required for the actual execution. This line is not explicitly used in the sample source, but is required in the actual runtime.

[Read More]