Enumerating Applications of Document Classification Problems Only

目次

Applying the Document Classification Problem

You’ve learned about machine learning, but you don’t know how to use it! Isn’t it?

It is easy to overlook this if you don’t pay attention to it when you study it, but if you don’t keep your antennas up, you won’t know how to use it.

If you don’t keep your antennae up, you won’t know how to use it. Since a tool is only a tool if it is used, you should make a note of how you use your newly acquired tool.

Scope of the Document Classification Problem

If you have studied document classification in natural language processing, you should be able to do the following.

  • Spam mail detection.
  • News topic classification
  • Extraction of important parts
  • Document summarization
  • Recommendations to users
  • Clustering
  • Sentiment analysis, etc.

It seems to have a surprisingly wide range of applications, doesn’t it?

What is the document classification problem?

It is the process of assigning one or more labels to a single document. In machine learning, we build a model to predict the labels.

Here, documents can be as short as a word or as long as a news article, and the length is not so important.

Labels can be important or not (binary), topic, sentiment (multiclass, multi-label), etc.

There are supervised and unsupervised methods.

Supervised

In the supervised case, the labels need to be prepared by humans.

This can be done by crowdsourcing annotations, collecting tags from social networking sites or reviews from e-commerce sites.

In this case, it is often impossible to prepare a sufficient amount of data on our own. For this reason, machine learning methods may be useful.

Unsupervised

Unsupervised is easy to prepare because you only need the data of documents.

For example, the data from Wikipedia can be used.

Features

We need to extract features from documents.

TF-IDF and distributed representation are used as features.

In recent research (as of 2020), deep learning is the main topic.

Based on the obtained features, classification can be done using machine learning methods such as SVM in the supervised case.

Actual usage examples

The actual usage is noted in the following link.

Learn about how distributed representation works.

We will learn about distributed representations, which learn the meanings of words. Why don’t you try to understand how distributed expressions work by actually running the program?

If you can get an idea of how distributed representations are learned, you will have a better understanding of what and how they are learned in the BERT system.

For more information, please refer to the following link

First Introduction to Natural Language Processing with Googlecolaboratory and python

This document will help readers to understand how distributed representation works in natural language processing, and help readers to develop new natural language processing services.

Reference books


See also