Lucene UnifiedHighlighter Example

Lucene UnifiedHighlighter the highest performing highlighter, especially for large documents. In this tutorial, learn to highlight search terms in indexed documents/files.

Table of Contents

Project Structure
Highlight Fragments with UnifiedHighlighter
Write Files to Lucene Index
Sourcecode

Project Structure

I am creating maven project to execute this example. And added these lucene dependencies.

<properties>
	<lucene.version>6.6.0</lucene.version>
</properties>

<dependency>
	<groupId>org.apache.lucene</groupId>
	<artifactId>lucene-core</artifactId>
	<version>${lucene.version}</version>
</dependency>
<dependency>
	<groupId>org.apache.lucene</groupId>
	<artifactId>lucene-analyzers-common</artifactId>
	<version>${lucene.version}</version>
</dependency>
<dependency>
	<groupId>org.apache.lucene</groupId>
	<artifactId>lucene-queryparser</artifactId>
	<version>${lucene.version}</version>
</dependency>

<!-- To include highlight support-->
<dependency>
	<groupId>org.apache.lucene</groupId>
	<artifactId>lucene-highlighter</artifactId>
	<version>${lucene.version}</version>
</dependency>

Project structure looks this now:

Lucene UnifiedHighlighter - Project Structure
Lucene UnifiedHighlighter – Project Structure

Please note that we will be using these two folders inside project:

  • inputFiles – will contain all text files which we want to index.
  • indexedFiles – will contain lucene indexed documents. We will search the index inside it.

Highlight Fragments with UnifiedHighlighter

Java example to use UnifiedHighlighter to highlight searched phrases or queries in lucene search results.

Here, I am searching lucene index created at folder indexedFiles. In next section, we will learn how I wrote these indexes.

package com.howtodoinjava.demo.lucene.highlight;

import java.nio.file.Paths;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Sort;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.uhighlight.UnifiedHighlighter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class LuceneUnifiedHighlighterExample
{
	//This contains the lucene indexed documents
	private static final String INDEX_DIR = "indexedFiles";

	public static void main(String[] args) throws Exception 
	{
		//Get directory reference
		Directory dir = FSDirectory.open(Paths.get(INDEX_DIR));
		
		//Index reader - an interface for accessing a point-in-time view of a lucene index
		IndexReader reader = DirectoryReader.open(dir);
		
		//Create lucene searcher. It search over a single IndexReader.
		IndexSearcher searcher = new IndexSearcher(reader);
		
		//analyzer with the default stop words
		Analyzer analyzer = new StandardAnalyzer();
		
		//Query parser to be used for creating TermQuery
		QueryParser qp = new QueryParser("contents", analyzer);
		
		//Create the query
		Query query = qp.parse("Questions");
		
		//Search the lucene documents
		TopDocs hits = searcher.search(query, 10, Sort.INDEXORDER);
		
		System.out.println("Search terms found in :: " + hits.totalHits + " files");
		
		UnifiedHighlighter highlighter = new UnifiedHighlighter(searcher, analyzer);
        String[] fragments = highlighter.highlight("contents", query, hits);

        for(String f : fragments)
        {
        	System.out.println(f);
        }
		
		//To get which fragment belong to which doc/file

		/*for (int i = 0; i < hits.scoreDocs.length; i++) 
        {
			int docid = hits.scoreDocs[i].doc;
            Document doc = searcher.doc(docid);
            
            String filePath = doc.get("path");
            System.out.println(filePath);
            System.out.println(fragments[i]);
        }*/

        dir.close();
	}
}

Output:

Search terms found in :: 3 files
Questions Girl private rich in do up or both. 
Questions explained agreeable preferred strangers too him her son. 
Questions Or neglected agreeable of discovery concluded oh it sportsman.

Write Files to Lucene Index

I am iterating all files in inputFiles folder and then indexing them. I am creating 3 fields:

  1. path : File path [Field.Store.YES]
  2. modified : File last modified timestamp
  3. contents : File content [Field.Store.YES]

LuceneWriteIndexFromFileExample.java

package com.howtodoinjava.demo.lucene.file;

import java.io.IOException;
import java.io.InputStream;
import java.nio.file.FileVisitResult;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.SimpleFileVisitor;
import java.nio.file.attribute.BasicFileAttributes;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.document.LongPoint;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.index.Term;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class LuceneWriteIndexFromFileExample 
{
	public static void main(String[] args)
	{
		//Input folder
		String docsPath = "inputFiles";
		
		//Output folder
		String indexPath = "indexedFiles";

		//Input Path Variable
		final Path docDir = Paths.get(docsPath);

		try 
		{
			//org.apache.lucene.store.Directory instance
			Directory dir = FSDirectory.open( Paths.get(indexPath) );
			
			//analyzer with the default stop words
			Analyzer analyzer = new StandardAnalyzer();
			
			//IndexWriter Configuration
			IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
			iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
			
			//IndexWriter writes new index files to the directory
			IndexWriter writer = new IndexWriter(dir, iwc);
			
			//Its recursive method to iterate all files and directories
			indexDocs(writer, docDir);

			writer.close();
		} 
		catch (IOException e) 
		{
			e.printStackTrace();
		}
	}
	
	static void indexDocs(final IndexWriter writer, Path path) throws IOException 
	{
		//Directory?
		if (Files.isDirectory(path)) 
		{
			//Iterate directory
			Files.walkFileTree(path, new SimpleFileVisitor<Path>() 
			{
				@Override
				public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException 
				{
					try 
					{
						//Index this file
						indexDoc(writer, file, attrs.lastModifiedTime().toMillis());
					} 
					catch (IOException ioe) 
					{
						ioe.printStackTrace();
					}
					return FileVisitResult.CONTINUE;
				}
			});
		} 
		else 
		{
			//Index this file
			indexDoc(writer, path, Files.getLastModifiedTime(path).toMillis());
		}
	}

	static void indexDoc(IndexWriter writer, Path file, long lastModified) throws IOException 
	{
		try (InputStream stream = Files.newInputStream(file)) 
		{
			//Create lucene Document
			Document doc = new Document();
			
			doc.add(new StringField("path", file.toString(), Field.Store.YES));
			doc.add(new LongPoint("modified", lastModified));
			doc.add(new TextField("contents", new String(Files.readAllBytes(file)), Store.YES));
			
			//Updates a document by first deleting the document(s) 
			//containing <code>term</code> and then adding the new
			//document.  The delete and then add are atomic as seen
			//by a reader on the same index
			writer.updateDocument(new Term("path", file.toString()), doc);
		}
	}
}

Sourcecode

Download the sourcecode using below given link.

Happy Learning !!

Comments

Subscribe
Notify of
guest
1 Comment
Most Voted
Newest Oldest
Inline Feedbacks
View all comments

About Us

HowToDoInJava provides tutorials and how-to guides on Java and related technologies.

It also shares the best practices, algorithms & solutions and frequently asked interview questions.

Our Blogs

REST API Tutorial

Dark Mode

Dark Mode