Dataconomy
  • News
  • AI
  • Big Data
  • Machine Learning
  • Trends
    • Blockchain
    • Cybersecurity
    • FinTech
    • Gaming
    • Internet of Things
    • Startups
    • Whitepapers
  • Industry
    • Energy & Environment
    • Finance
    • Healthcare
    • Industrial Goods & Services
    • Marketing & Sales
    • Retail & Consumer
    • Technology & IT
    • Transportation & Logistics
  • Events
  • About
    • About Us
    • Contact
    • Imprint
    • Legal & Privacy
    • Newsletter
    • Partner With Us
    • Writers wanted
Subscribe
No Result
View All Result
Dataconomy
  • News
  • AI
  • Big Data
  • Machine Learning
  • Trends
    • Blockchain
    • Cybersecurity
    • FinTech
    • Gaming
    • Internet of Things
    • Startups
    • Whitepapers
  • Industry
    • Energy & Environment
    • Finance
    • Healthcare
    • Industrial Goods & Services
    • Marketing & Sales
    • Retail & Consumer
    • Technology & IT
    • Transportation & Logistics
  • Events
  • About
    • About Us
    • Contact
    • Imprint
    • Legal & Privacy
    • Newsletter
    • Partner With Us
    • Writers wanted
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

How to use ElasticSearch for Natural Language Processing and Text Mining — Part 1

by Saskia Vola
May 22, 2017
in Contributors, Data Science
Home Contributors
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

ElasticSearch is a search engine and an analytics platform. But it offers many features that are useful for standard Natural Language Processing and Text Mining tasks.

1. Preprocessing (Normalization)

Have you ever used the _analyze endpoint?

As you know ElasticSearch has over 20 language-analyzers built in. What is an analyzer doing? Tokenization, stemming and stopword removal.

That is very often all you need for preprocessing for higher level tasks such as Machine Learning, Language Modelling etc.

You basically just need a running instance of ElasticSearch, without any configuration or setup. Then you can use the analyze-endpoint as a Rest-API for NLP-preprocessing.

curl -XGET "http://localhost:9200/_analyze?analyzer=english" -d'
{
  "text" : "This is a test."
}'

{
  "tokens": [
  {
  "token": "test",
  "start_offset": 10,
  "end_offset": 14,
  "type": "<ALPHANUM>",
  "position": 3
  }
  ]
 }

Here’s a list of all available built in language analyzers.

2. Language Detection

Detecting languages is a so called “solved” NLP problem. You just need a character ngram language model derived by a relatively small plain text-corpus from all languages you want to distinguish.

So no need to reinvent the wheel over and over.

When you’re already have ElasticSearch up and running, you can simply install another plugin.


curl -XPOST 'localhost:9200/_langdetect?pretty' -d 'This is a test'
{
  "profile" : "/langdetect/",
  "languages" : [ {
    "language" : "en",
    "probability" : 0.9999971603535163
  } ]
}

That’s it. It’s open source, free to use and super simple.

How to use ElasticSearch for Text Mining appeared originally on textminers.io ‘s blog


 

Like this article? Subscribe to our weekly newsletter to never miss out!

Follow @DataconomyMedia

Image: born1945, CC 2.0

Tags: ElasticsearchHow to use Elasticsearch for NLP and Text MiningNLPSaskia VolaText mining

Related Posts

Taking pictures is so last year: “Prompt” pictures with Paragraphica

Taking pictures is so last year: “Prompt” pictures with Paragraphica

June 2, 2023
Sneak peek at Microsoft Fabric price and its promising features

Sneak peek at Microsoft Fabric price and its promising features

June 1, 2023
Skybox AI brings AI to VR

Skybox AI brings AI to VR

June 1, 2023
Whispering algorithms of smart surroundings

Whispering algorithms of smart surroundings

May 30, 2023
Infrastructure challenges and opportunities for AI startups

Infrastructure challenges and opportunities for AI startups

May 31, 2023
QR codes in AI and ML: Enhancing predictive analytics for business

QR codes in AI and ML: Enhancing predictive analytics for business

May 29, 2023

Comments 3

  1. Stas Mossat says:
    6 years ago

    Is that all? How about significant terms and etc?

    Reply
    • admin says:
      6 years ago

      Part 2 is coming soon. Keep an eye out for it!

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

LATEST ARTICLES

Trolling is fun until it is not

Taking pictures is so last year: “Prompt” pictures with Paragraphica

Operation Triangulation: Could Apple be an NSA agent, Russia asks

NEDA did not forgive Tessa’s mistake and terminated the AI chatbot after the backlash

Manage your friends list with Snapchat’s new galaxy-themed feature

Sneak peek at Microsoft Fabric price and its promising features

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy
  • Partnership
  • Writers wanted

Follow Us

  • News
  • AI
  • Big Data
  • Machine Learning
  • Trends
    • Blockchain
    • Cybersecurity
    • FinTech
    • Gaming
    • Internet of Things
    • Startups
    • Whitepapers
  • Industry
    • Energy & Environment
    • Finance
    • Healthcare
    • Industrial Goods & Services
    • Marketing & Sales
    • Retail & Consumer
    • Technology & IT
    • Transportation & Logistics
  • Events
  • About
    • About Us
    • Contact
    • Imprint
    • Legal & Privacy
    • Newsletter
    • Partner With Us
    • Writers wanted
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.