Lucene UnifiedHighlighter the highest performing highlighter, especially for large documents. In this tutorial, learn to highlight search terms in indexed documents/files.
Table of Contents Project Structure Highlight Fragments with UnifiedHighlighter Write Files to Lucene Index Sourcecode
Project Structure
I am creating maven project to execute this example. And added these lucene dependencies.
<properties> <lucene.version>6.6.0</lucene.version> </properties> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-core</artifactId> <version>${lucene.version}</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-analyzers-common</artifactId> <version>${lucene.version}</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-queryparser</artifactId> <version>${lucene.version}</version> </dependency> <!-- To include highlight support--> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-highlighter</artifactId> <version>${lucene.version}</version> </dependency>
Project structure looks this now:

Please note that we will be using these two folders inside project:
inputFiles
– will contain all text files which we want to index.indexedFiles
– will contain lucene indexed documents. We will search the index inside it.
Highlight Fragments with UnifiedHighlighter
Java example to use UnifiedHighlighter
to highlight searched phrases or queries in lucene search results.
Here, I am searching lucene index created at folder indexedFiles
. In next section, we will learn how I wrote these indexes.
package com.howtodoinjava.demo.lucene.highlight; import java.nio.file.Paths; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.queryparser.classic.QueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.Sort; import org.apache.lucene.search.TopDocs; import org.apache.lucene.search.uhighlight.UnifiedHighlighter; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; public class LuceneUnifiedHighlighterExample { //This contains the lucene indexed documents private static final String INDEX_DIR = "indexedFiles"; public static void main(String[] args) throws Exception { //Get directory reference Directory dir = FSDirectory.open(Paths.get(INDEX_DIR)); //Index reader - an interface for accessing a point-in-time view of a lucene index IndexReader reader = DirectoryReader.open(dir); //Create lucene searcher. It search over a single IndexReader. IndexSearcher searcher = new IndexSearcher(reader); //analyzer with the default stop words Analyzer analyzer = new StandardAnalyzer(); //Query parser to be used for creating TermQuery QueryParser qp = new QueryParser("contents", analyzer); //Create the query Query query = qp.parse("Questions"); //Search the lucene documents TopDocs hits = searcher.search(query, 10, Sort.INDEXORDER); System.out.println("Search terms found in :: " + hits.totalHits + " files"); UnifiedHighlighter highlighter = new UnifiedHighlighter(searcher, analyzer); String[] fragments = highlighter.highlight("contents", query, hits); for(String f : fragments) { System.out.println(f); } //To get which fragment belong to which doc/file /*for (int i = 0; i < hits.scoreDocs.length; i++) { int docid = hits.scoreDocs[i].doc; Document doc = searcher.doc(docid); String filePath = doc.get("path"); System.out.println(filePath); System.out.println(fragments[i]); }*/ dir.close(); } }
Output:
Search terms found in :: 3 files Questions Girl private rich in do up or both. Questions explained agreeable preferred strangers too him her son. Questions Or neglected agreeable of discovery concluded oh it sportsman.
Write Files to Lucene Index
I am iterating all files in inputFiles
folder and then indexing them. I am creating 3 fields:
- path : File path [Field.Store.YES]
- modified : File last modified timestamp
- contents : File content [Field.Store.YES]
LuceneWriteIndexFromFileExample.java
package com.howtodoinjava.demo.lucene.file; import java.io.IOException; import java.io.InputStream; import java.nio.file.FileVisitResult; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; import java.nio.file.SimpleFileVisitor; import java.nio.file.attribute.BasicFileAttributes; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.Field.Store; import org.apache.lucene.document.LongPoint; import org.apache.lucene.document.StringField; import org.apache.lucene.document.TextField; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.IndexWriterConfig.OpenMode; import org.apache.lucene.index.Term; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; public class LuceneWriteIndexFromFileExample { public static void main(String[] args) { //Input folder String docsPath = "inputFiles"; //Output folder String indexPath = "indexedFiles"; //Input Path Variable final Path docDir = Paths.get(docsPath); try { //org.apache.lucene.store.Directory instance Directory dir = FSDirectory.open( Paths.get(indexPath) ); //analyzer with the default stop words Analyzer analyzer = new StandardAnalyzer(); //IndexWriter Configuration IndexWriterConfig iwc = new IndexWriterConfig(analyzer); iwc.setOpenMode(OpenMode.CREATE_OR_APPEND); //IndexWriter writes new index files to the directory IndexWriter writer = new IndexWriter(dir, iwc); //Its recursive method to iterate all files and directories indexDocs(writer, docDir); writer.close(); } catch (IOException e) { e.printStackTrace(); } } static void indexDocs(final IndexWriter writer, Path path) throws IOException { //Directory? if (Files.isDirectory(path)) { //Iterate directory Files.walkFileTree(path, new SimpleFileVisitor<Path>() { @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException { try { //Index this file indexDoc(writer, file, attrs.lastModifiedTime().toMillis()); } catch (IOException ioe) { ioe.printStackTrace(); } return FileVisitResult.CONTINUE; } }); } else { //Index this file indexDoc(writer, path, Files.getLastModifiedTime(path).toMillis()); } } static void indexDoc(IndexWriter writer, Path file, long lastModified) throws IOException { try (InputStream stream = Files.newInputStream(file)) { //Create lucene Document Document doc = new Document(); doc.add(new StringField("path", file.toString(), Field.Store.YES)); doc.add(new LongPoint("modified", lastModified)); doc.add(new TextField("contents", new String(Files.readAllBytes(file)), Store.YES)); //Updates a document by first deleting the document(s) //containing <code>term</code> and then adding the new //document. The delete and then add are atomic as seen //by a reader on the same index writer.updateDocument(new Term("path", file.toString()), doc); } } }
Sourcecode
Download the sourcecode using below given link.
Happy Learning !!
Comments