.
A vector of documents can be obtained using Universal Sentence Encoder.
Features
Supports multiple languages.
Japanese is supported.
Can handle Japanese sentences as vectors.
Usage
Clustering, similarity calculation, feature extraction.
Usage
Execute the following command as preparation.
pip install tensorflow tensorflow_hub tensorflow_text numpy
Trained models are available.
See the python description below for details on how to use it.
import tensorflow_hub as hub
import tensorflow_text
import numpy as np
# for avoiding error
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
def cos_sim(v1, v2):
return np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))
embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual/3")
texts = ["I saw a comedy show yesterday." , "There was a comedy show on TV last night." , "I went to the park yesterday." , "I saw a comedy show last night.", "Yesterday, I went to the park."]
vectors = embed(texts)
```.
See the following link for more details
[Try Universal Sentence Encoder in Japanese](https://qiita.com/kenta1984/items/9613da23766a2578a27a)
### Postscript
```py
import tensorflow_text
Without this line, you will get an error like ``Sentencepiece not found! error.
This line is not explicitly used in the sample source, but is required for the actual execution.
This line is not explicitly used in the sample source, but is required in the actual runtime.
[Read More]