Today, we discuss Multi-Language Identification Using Natural Language Processing and its usage.
Your company has given 100 pages document, and they told you to check how many languages and how many times it appeared in that document.
This library can solve your problem if you are looking for a solution. This library is easy to use, and you can integrate it with the frontend and upload or copy and paste the document. I have checked with a single line of words and found a problem in a code with the German and Hindi languages. I will brief you after executing the code, and also I will notify this developer about this issue.
I have downloaded the library and executed it on Google Colab, or you can execute the below code and use your own IDE.
Collecting seqtolang Downloading https://files.pythonhosted.org/packages/1b/4c/ ae1a25dff2b06476b9c707642adea530ef45994a9986b9035d7418e980b1/seqtolang-0.1.4-py3-none-any.whl (29.0MB) |████████████████████████████████| 29.0MB 109kB/s Requirement already satisfied: torch>=1.1.0 in /usr/local/lib/python3.7/dist-packages (from seqtolang) (1.9.0+cu102) Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch>=1.1.0->seqtolang) (3.7.4.3) Installing collected packages: seqtolang Successfully installed seqtolang-0.1.4
[('hin', 0.4620698094367981), ('eng', 0.22681103646755219), ('jpn', 0.09090376645326614), ('fra', 0.08465294539928436), ('deu', 0.054195597767829895)]
['eng', 'eng', 'deu', 'eng', 'fra', 'jpn', 'hin', 'hin', 'hin', 'hin', 'hin']
We can notice two issues:
- German word ‘Morgen’ counted as English
- Hindi words ‘shubh prabhaat’ count shows as 5.
You can download the full code from the developer GitHub repository.
What is your view on this library?
Please test at your side and comment below.
Further Reading
Posts on Artificial Intelligence, Deep Learning, Machine Learning, and Design Thinking articles:
Rasa X Open Source Conversational AI UI Walk-through
Artificial Intelligence Chatbot Using Neural Network and Natural Language Processing
Leave A Comment