In this lucene 6 example, we will learn to create index from files and then search tokens within indexed documents. To learn about installing lucene, please refer to lucene index and search example.
Table of Contents Project Structure Index Text Files Content Search Indexed Files Demo Sourcecode
Project Structure
I am creating maven project to execute this example. And added these lucene dependencies.
<properties> <lucene.version>6.6.0</lucene.version> </properties> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-core</artifactId> <version>${lucene.version}</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-analyzers-common</artifactId> <version>${lucene.version}</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-queryparser</artifactId> <version>${lucene.version}</version> </dependency>
Project structure looks this now:

Please note that we will be using these two folders inside project:
inputFiles
– will contain all text files which we want to index.indexedFiles
– will contain lucene indexed documents. We will search the index inside it.
Index Text Files Content
I am iterating all files in inputFiles
folder and then indexing them. I am creating 3 fields:
- path : File path [Field.Store.YES]
- modified : File last modified timestamp
- contents : File content [Field.Store.YES]
YES
value causes lucene to store the original field value in the index.LuceneWriteIndexFromFileExample.java
package com.howtodoinjava.demo.lucene.file; import java.io.IOException; import java.io.InputStream; import java.nio.file.FileVisitResult; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; import java.nio.file.SimpleFileVisitor; import java.nio.file.attribute.BasicFileAttributes; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.Field.Store; import org.apache.lucene.document.LongPoint; import org.apache.lucene.document.StringField; import org.apache.lucene.document.TextField; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.IndexWriterConfig.OpenMode; import org.apache.lucene.index.Term; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; public class LuceneWriteIndexFromFileExample { public static void main(String[] args) { //Input folder String docsPath = "inputFiles"; //Output folder String indexPath = "indexedFiles"; //Input Path Variable final Path docDir = Paths.get(docsPath); try { //org.apache.lucene.store.Directory instance Directory dir = FSDirectory.open( Paths.get(indexPath) ); //analyzer with the default stop words Analyzer analyzer = new StandardAnalyzer(); //IndexWriter Configuration IndexWriterConfig iwc = new IndexWriterConfig(analyzer); iwc.setOpenMode(OpenMode.CREATE_OR_APPEND); //IndexWriter writes new index files to the directory IndexWriter writer = new IndexWriter(dir, iwc); //Its recursive method to iterate all files and directories indexDocs(writer, docDir); writer.close(); } catch (IOException e) { e.printStackTrace(); } } static void indexDocs(final IndexWriter writer, Path path) throws IOException { //Directory? if (Files.isDirectory(path)) { //Iterate directory Files.walkFileTree(path, new SimpleFileVisitor<Path>() { @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException { try { //Index this file indexDoc(writer, file, attrs.lastModifiedTime().toMillis()); } catch (IOException ioe) { ioe.printStackTrace(); } return FileVisitResult.CONTINUE; } }); } else { //Index this file indexDoc(writer, path, Files.getLastModifiedTime(path).toMillis()); } } static void indexDoc(IndexWriter writer, Path file, long lastModified) throws IOException { try (InputStream stream = Files.newInputStream(file)) { //Create lucene Document Document doc = new Document(); doc.add(new StringField("path", file.toString(), Field.Store.YES)); doc.add(new LongPoint("modified", lastModified)); doc.add(new TextField("contents", new String(Files.readAllBytes(file)), Store.YES)); //Updates a document by first deleting the document(s) //containing <code>term</code> and then adding the new //document. The delete and then add are atomic as seen //by a reader on the same index writer.updateDocument(new Term("path", file.toString()), doc); } } }
Search Indexed Files
In this section, we will search the index created in previous step i.e. we will search the documents which contain our search query terms.
package com.howtodoinjava.demo.lucene.file; import java.io.IOException; import java.nio.file.Paths; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.queryparser.classic.QueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; public class LuceneReadIndexFromFileExample { //directory contains the lucene indexes private static final String INDEX_DIR = "indexedFiles"; public static void main(String[] args) throws Exception { //Create lucene searcher. It search over a single IndexReader. IndexSearcher searcher = createSearcher(); //Search indexed contents using search term TopDocs foundDocs = searchInContent("frequently", searcher); //Total found documents System.out.println("Total Results :: " + foundDocs.totalHits); //Let's print out the path of files which have searched term for (ScoreDoc sd : foundDocs.scoreDocs) { Document d = searcher.doc(sd.doc); System.out.println("Path : "+ d.get("path") + ", Score : " + sd.score); } } private static TopDocs searchInContent(String textToFind, IndexSearcher searcher) throws Exception { //Create search query QueryParser qp = new QueryParser("contents", new StandardAnalyzer()); Query query = qp.parse(textToFind); //search the index TopDocs hits = searcher.search(query, 10); return hits; } private static IndexSearcher createSearcher() throws IOException { Directory dir = FSDirectory.open(Paths.get(INDEX_DIR)); //It is an interface for accessing a point-in-time view of a lucene index IndexReader reader = DirectoryReader.open(dir); //Index searcher IndexSearcher searcher = new IndexSearcher(reader); return searcher; } }
Demo
- Let’s create 3 files in folder
inputFiles
with following content.data1.txt
Society excited by cottage private an it esteems. Fully begin on by wound an. Girl rich in do up or both. At declared in as rejoiced of together. He impression collecting delightful unpleasant by prosperous as on. End too talent she object mrs wanted remove giving.
data2.txt
Questions explained agreeable preferred strangers too him her son. Set put shyness offices his females him distant. Improve has message besides shy himself cheered however how son. Quick judge other leave ask first chief her. Indeed or remark always silent seemed narrow be. Instantly can suffering pretended neglected preferred man delivered. Perhaps fertile brandon do imagine to cordial cottage.
data3.txt
Or neglected agreeable of discovery concluded oh it sportsman. Week to time in john. Son elegance use weddings separate. Ask too matter formed county wicket oppose talent. He immediate sometimes or to dependent in. Everything few frequently discretion surrounded did simplicity decisively. Less he year do with no sure loud.
- Execute
LuceneWriteIndexFromFileExample.java
using it’smain()
method. Verify that lucene indexes are created inindexedFiles
folder. - Let’s say I want to search documents containing word “agreeable”. Change the search term in line no. 29 of class
LuceneReadIndexFromFileExample.java
. Execute the class using it’smain()
method. Verify the output:Total Results :: 2 Path : inputFiles\data3.txt, Score : 0.47632512 Path : inputFiles\data2.txt, Score : 0.38863274
- Search more terms and verify them yourselves.
Sourcecode
Download the sourcecode using below given link.
Happy Learning !!
Comments