Day 66 – Multi-Language Identification Using Natural Language Processing

Today, we discuss Multi-Language Identification Using Natural Language Processing and its usage.

Your company has given 100 pages document, and they told you to check how many languages and how many times it appeared in that document.

This library can solve your problem if you are looking for a solution.  This library is easy to use, and you can integrate it with the frontend and upload or copy and paste the document.  I have checked with a single line of words and found a problem in a code with the German and Hindi languages.  I will brief you after executing the code, and also I will notify this developer about this issue.

I have downloaded the library and executed it on Google Colab, or you can execute the below code and use your own IDE.

Copy to Clipboard
Collecting seqtolang
  Downloading https://files.pythonhosted.org/packages/1b/4c/
ae1a25dff2b06476b9c707642adea530ef45994a9986b9035d7418e980b1/seqtolang-0.1.4-py3-none-any.whl (29.0MB)
     |████████████████████████████████| 29.0MB 109kB/s 
Requirement already satisfied: torch>=1.1.0 in /usr/local/lib/python3.7/dist-packages 
(from seqtolang) (1.9.0+cu102)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages 
(from torch>=1.1.0->seqtolang) (3.7.4.3)
Installing collected packages: seqtolang
Successfully installed seqtolang-0.1.4
Copy to Clipboard
[('hin', 0.4620698094367981), ('eng', 0.22681103646755219), ('jpn', 0.09090376645326614), 
('fra', 0.08465294539928436), ('deu', 0.054195597767829895)]
Copy to Clipboard
['eng', 'eng', 'deu', 'eng', 'fra', 'jpn', 'hin', 'hin', 'hin', 'hin', 'hin']

We can notice two issues:

  1. German word ‘Morgen’ counted as English
  2. Hindi words ‘shubh prabhaat’ count shows as 5.

You can download the full code from the developer GitHub repository.

What is your view on this library?

Please test at your side and comment below.

By |2021-06-24T00:56:56+00:00June 23rd, 2021|Machine Learning|0 Comments

About the Author:

Leave A Comment