Lucene UnifiedHighlighter Example

The Lucene UnifiedHighlighter is the the highest-performing highlighter, especially for large documents. In this Lucene tutorial, learn to highlight search terms found in the indexed documents/files.

1. Prerequisites

We are assuming that you have already created the Lucene indexes by reading some text files and writing them into the index location. If not, follow the Lucene example to write some text files, first.

2. Maven

Start with adding these Lucene dependencies. We are using Lucene 9.10.0 and Java 21.

<properties> 
  <maven.compiler.source>21</maven.compiler.source>
  <maven.compiler.target>21</maven.compiler.target>
  <lucene.version>9.10.0</lucene.version>
</properties>

<dependency>
  <groupId>org.apache.lucene</groupId>
  <artifactId>lucene-core</artifactId>
  <version>${lucene.version}</version>
</dependency>
<dependency>
  <groupId>org.apache.lucene</groupId>
  <artifactId>lucene-analysis-common</artifactId>
  <version>${lucene.version}</version>
</dependency>
<dependency>
  <groupId>org.apache.lucene</groupId>
  <artifactId>lucene-queryparser</artifactId>
  <version>${lucene.version}</version>
</dependency>
<dependency>
  <groupId>org.apache.lucene</groupId>
  <artifactId>lucene-highlighter</artifactId>
  <version>${lucene.version}</version>
</dependency>

Please note that we will be using these two folders for demo:

‘c:/temp/lucene/inputFiles‘ contains all text files which we want to index.
‘c:/temp/lucene/indexedFiles‘ contains the Lucene indexed documents. We will search the index inside it.

3. Highlighting Fragments with UnifiedHighlighter

Java example to use UnifiedHighlighter to highlight searched phrases or queries in lucene search results.

In this example:

An IndexSearcher is used to search the index.
A QueryParser is used to parse the search query.
The highlighter is configured with a SimpleHTMLFormatter to wrap the highlighted terms in <b> tags.
The search results are retrieved and the highlighted text fragments are printed in the console.

import java.nio.file.Paths;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Sort;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.uhighlight.UnifiedHighlighter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class LuceneUnifiedHighlighterExample {

  //This contains the lucene indexed documents
  private static final String INDEX_DIR = "c:/temp/lucene/indexedFiles";
  private static String search_query = "Questions";

  public static void main(String[] args) throws Exception {
    //Get directory reference
    Directory dir = FSDirectory.open(Paths.get(INDEX_DIR));

    //Index reader - an interface for accessing a point-in-time view of a lucene index
    IndexReader reader = DirectoryReader.open(dir);

    //Create lucene searcher. It searches over a single IndexReader.
    IndexSearcher searcher = new IndexSearcher(reader);

    //analyzer with the default stop words
    Analyzer analyzer = new StandardAnalyzer();

    //Query parser to be used for creating TermQuery
    QueryParser qp = new QueryParser("contents", analyzer);

    //Create the query
    Query query = qp.parse(search_query);

    //Search the lucene documents
    TopDocs hits = searcher.search(query, 10, Sort.INDEXORDER);

    System.out.println("Search terms found in :: " + hits.totalHits + " files");

    UnifiedHighlighter highlighter = new UnifiedHighlighter(searcher, analyzer);
    highlighter.setFormatter(new SimpleHTMLFormatter("<b>", "</b>"));
    String[] fragments = highlighter.highlight("contents", query, hits);

    for (String f : fragments) {
      System.out.println(f);
    }

    //To get which fragment belong to which doc/file

    /*for (int i = 0; i < hits.scoreDocs.length; i++)
        {
      int docid = hits.scoreDocs[i].doc;
            Document doc = searcher.doc(docid);

            String filePath = doc.get("path");
            System.out.println(filePath);
            System.out.println(fragments[i]);
        }*/

    dir.close();
  }
}

The program output:

Search terms found in :: 3 files
Questions Girl private rich in do up or both. 
Questions explained agreeable preferred strangers too him her son. 
Questions Or neglected agreeable of discovery concluded oh it sportsman.

Happy Learning !!

Source Code on Github

Lucene UnifiedHighlighter Example

1. Prerequisites

2. Maven

3. Highlighting Fragments with UnifiedHighlighter

Weekly Newsletter

Comments

Lucene Search Highlight Example

Lucene Wildcard Query Search Example

About Us

Tutorial Series

Meta Links

Our Blogs

Follow On: