Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

What You Don’t Know About Apache Lucene

byEileen McNulty
July 10, 2014
in Conversations
Home Conversations
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

apache luceneAccording to his LinkedIn profile, Robert Muir is Mongolia-based Ghostbuster for Elasticsearch. Any activities involving the elimination of supernatural entities aside, what we do know is that his work at Elasticsearch involves implementing and improving the reliability of Apache Lucene. He’s also an Apache Lucene committer; Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. We caught up with Robert at Berlin Buzzwords to discuss his work, how people are using Lucene and what we can expect from Lucene in the future- sadly, there was no talk of ghosts.


Tell us a little bit about yourself and your work.

My name is Robert Muir and I’ve been an Apache Lucene committer for five years now.I work for Elasticsearch; I’m a developer there and I mostly work on Lucene.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

The talk you gave today was on the new features or Apache Lucene. Do you want to give us a brief overview of that?

Essentially Lucene has grown a lot since Lucene 4. It’s more than just a core indexing library. We have features that people expect of search engines driven by Google, like auto-suggest and highlighting and faceting. So in Lucene 4, we have all of this other stuff you need around search as part of the library. The idea of this talk is that you can go to the store and buy Lucene in action and get a good description that’s maybe three or four years out of date, but it won’t tell you about all these cool features that you need to deal with. Like you need auto suggest. All users expect it. So the talk was to give people an idea of how it works in Lucene 4. A sort of up-to-date overview.

So there’s been alot of talks about search engines this year- search seems to be the buzzword of this year’s Berlin Buzzwords. What is it about Lucene that you think makes it stand out?

I think the first thing that people are attracted to when they use Lucene is that it’s fast- much faster than you would expect. Maybe because it’s Java code, they expect it to be much slower than it is. It’s much faster than a database usually for a lot of types of queries that users want to do these days. I think a part of dealing with lots of data is that you can’t deal with it all at once. So search is more naturally here because you’re just saying, I want to look at the most relevant stuff because I can’t look at all of it.

One of the things that stood out to me during the talk was how customizable Lucene is. How important is the customization when you are developing Lucene? Is that one of the main priorities?

Lucene always began as an API, which is different than say an Oracle database, where you have a server. Because of that, I think customization has always been a high priority. It’s built for just that. It’s built if you want to embed search somewhere to do something custom. If you want to have something more out-of-the-box, you can get Solr or Elasticsearch, which are the server version. We just make the customisable low-level engine and people use it in radically different ways for different purposes. So it’s definitely a huge priority.

Are there any particular use cases of Lucene that you find particularly interesting?

At Elasticsearch, we see a lot of people using it for log analysis. We see a lot of people doing stuff that’s more like analytics. And I think it’s really interesting because I just never thought about using Lucene for that, but it works pretty well and it solves a lot of real-world needs. I mean, I think we could probably make some improvements- we see these use cases and as developers, we haven’t tuned in for that or thought about it. So it’s cool for that reason.

Can you tell us a little bit more as well about your work with Elasticsearch?

I just started working there for about a month or two ago, and basically I work on Lucene. The first thing we did is we worked on improving sort of the reliability of Lucene. Lucene didn’t have bugs, but we just didn’t have features that you would expect to have for a data store. And these features are things like adding detection of errors to improve reliability. And you’ve got systems like Solr and Elasticsearch taking Lucene indexes and sending them around on the network, so we need to detect when something goes wrong. So we added file check summing, for example, to Lucene. That’s one of the first things I did. I think we improved the robustness a lot just with that change. It’s changes like that which make working on Lucene exciting.

What are you working on for the future of Lucene?

I can tell you what we’re working on right now, because we don’t really have a good idea of what’s coming- it’s open source, so it’s all up in the air. Currently I’m working on improving the way queries execute. And long-term, hopefully the way they work with positions to have more power, more flexibility and greater speed. So hopefully this is something we’ll fix this year.

Big data has gained a huge amount of momentum and hype over the past couple of years- where do you think this is headed?

There’s more and more information and we’re getting overloaded by it. I think search is an important role here as it allows you to sift through everything and find the needle in a haystack.As we’re drowning in data, I think improving the quality, performance and usability of the search is really important.


apache lucene 2

Elasticsearch is a real-time search server based on Lucene, with high availibility and multi-tenancy. In collaboration with Logstash and Kibana, they formed an end-to-end “ELK” stack that delivers actionable insights in real-time from almost any type of structured and unstructured data source.


 

(Image credit: Apache Lucene)

Tags: apache luceneApache Software FoundationDatabase Technology NewsletterElasticsearch

Related Posts

Panathēnea’s builders are rethinking what a tech gathering can be

Panathēnea’s builders are rethinking what a tech gathering can be

May 7, 2026
Zero trust in the age of AI: Why your data governance is now your security strategy

Zero trust in the age of AI: Why your data governance is now your security strategy

April 28, 2026
Designing intelligent systems: Prasannavenkatesh Chandrasekar on translating complexity into real-world outcomes

Designing intelligent systems: Prasannavenkatesh Chandrasekar on translating complexity into real-world outcomes

April 14, 2026
Why most enterprise AI projects never reach production: “The model is rarely the main problem,” says NTT DATA Consultant Alex Potapov

Why most enterprise AI projects never reach production: “The model is rarely the main problem,” says NTT DATA Consultant Alex Potapov

April 6, 2026
Your AI program has a data problem, you just don’t know it yet

Your AI program has a data problem, you just don’t know it yet

April 3, 2026
How specialised AI models are redefining cost efficiency in subscription businesses

How specialised AI models are redefining cost efficiency in subscription businesses

March 30, 2026
Please login to join discussion

LATEST NEWS

Why Telegram Mini Apps have become the optimal ecosystem for launching AI SaaS products

Crypto investors are watching one date closely in 2026

How Telegram Creators test post visibility before running growth campaigns

Does your AI clock in without you?

Why secure software delivery depends on better release management

Sony reveals God of War: Laufey for PS5

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Veed.io

Paper Pilot

IsOn24

Magnific

DADABOTS

Rosebud AI

Prome

Pageon AI

Vyond

Centauri AI

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.