The Lucene UnifiedHighlighter is the the highest-performing highlighter, especially for large documents. In this Lucene tutorial, learn to highlight search terms found in the indexed documents/files.
1. Prerequisites
We are assuming that you have already created the Lucene indexes by reading some text files and writing them into the index location. If not, follow the Lucene example to write some text files, first.
2. Maven
Start with adding these Lucene dependencies. We are using Lucene 9.10.0 and Java 21.
<properties>
<maven.compiler.source>21</maven.compiler.source>
<maven.compiler.target>21</maven.compiler.target>
<lucene.version>9.10.0</lucene.version>
</properties>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>${lucene.version}</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analysis-common</artifactId>
<version>${lucene.version}</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-queryparser</artifactId>
<version>${lucene.version}</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-highlighter</artifactId>
<version>${lucene.version}</version>
</dependency>
Please note that we will be using these two folders for demo:
- ‘c:/temp/lucene/inputFiles‘ contains all text files which we want to index.
- ‘c:/temp/lucene/indexedFiles‘ contains the Lucene indexed documents. We will search the index inside it.
3. Highlighting Fragments with UnifiedHighlighter
Java example to use UnifiedHighlighter
to highlight searched phrases or queries in lucene search results.
In this example:
- An IndexSearcher is used to search the index.
- A QueryParser is used to parse the search query.
- The highlighter is configured with a SimpleHTMLFormatter to wrap the highlighted terms in
<b>
tags. - The search results are retrieved and the highlighted text fragments are printed in the console.
import java.nio.file.Paths;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Sort;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.uhighlight.UnifiedHighlighter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
public class LuceneUnifiedHighlighterExample {
//This contains the lucene indexed documents
private static final String INDEX_DIR = "c:/temp/lucene/indexedFiles";
private static String search_query = "Questions";
public static void main(String[] args) throws Exception {
//Get directory reference
Directory dir = FSDirectory.open(Paths.get(INDEX_DIR));
//Index reader - an interface for accessing a point-in-time view of a lucene index
IndexReader reader = DirectoryReader.open(dir);
//Create lucene searcher. It searches over a single IndexReader.
IndexSearcher searcher = new IndexSearcher(reader);
//analyzer with the default stop words
Analyzer analyzer = new StandardAnalyzer();
//Query parser to be used for creating TermQuery
QueryParser qp = new QueryParser("contents", analyzer);
//Create the query
Query query = qp.parse(search_query);
//Search the lucene documents
TopDocs hits = searcher.search(query, 10, Sort.INDEXORDER);
System.out.println("Search terms found in :: " + hits.totalHits + " files");
UnifiedHighlighter highlighter = new UnifiedHighlighter(searcher, analyzer);
highlighter.setFormatter(new SimpleHTMLFormatter("<b>", "</b>"));
String[] fragments = highlighter.highlight("contents", query, hits);
for (String f : fragments) {
System.out.println(f);
}
//To get which fragment belong to which doc/file
/*for (int i = 0; i < hits.scoreDocs.length; i++)
{
int docid = hits.scoreDocs[i].doc;
Document doc = searcher.doc(docid);
String filePath = doc.get("path");
System.out.println(filePath);
System.out.println(fragments[i]);
}*/
dir.close();
}
}
The program output:
Search terms found in :: 3 files
Questions Girl private rich in do up or both.
Questions explained agreeable preferred strangers too him her son.
Questions Or neglected agreeable of discovery concluded oh it sportsman.
Happy Learning !!
Comments