In this Lucene tutorial, we will learn to create indexes from unstructured text files and then search tokens within the indexed documents. To learn about installing Lucene, please refer to the Lucene index and search example.
1. Maven
Start with adding these Lucene dependencies. We are using Lucene 9.10.0 and Java 21.
<properties>
<maven.compiler.source>21</maven.compiler.source>
<maven.compiler.target>21</maven.compiler.target>
<lucene.version>9.10.0</lucene.version>
</properties>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>${lucene.version}</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analysis-common</artifactId>
<version>${lucene.version}</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-queryparser</artifactId>
<version>${lucene.version}</version>
</dependency>
Additionally, we are using these two folders:
//Input folder where text files are present
String docsPath = "c:/temp/lucene/inputFiles";
//Index folder where the indexes will be created
String indexPath = "c:/temp/lucene/indexedFiles";
2. Indexing the Text File Contents
To index the file contents, we are iterating over all the text files in inputFiles folder and then indexing them. We are creating 3 fields in the Lucene document:
- path: File path [Field.Store.YES]
- modified: File last modified timestamp
- contents: File content [Field.Store.YES]
If a document is indexed but not stored, you can search for it, but it won’t be returned with search results. A
YES
value causes lucene to store the original field value in the index.
The following Java program uses Files.walkFileTree() to find and iterate over the text files in the provided directory, and later uses org.apache.lucene.index.IndexWriter to write the document in the index.
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.FileVisitResult;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.SimpleFileVisitor;
import java.nio.file.attribute.BasicFileAttributes;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.document.LongPoint;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.index.Term;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
public class LuceneWriteIndexFromFileExample {
public static void main(String[] args) {
//Input folder
String docsPath = "c:/temp/lucene/inputFiles";
//Output folder
String indexPath = "c:/temp/lucene/indexedFiles";
//Input Path Variable
final Path docDir = Paths.get(docsPath);
try {
//org.apache.lucene.store.Directory instance
Directory dir = FSDirectory.open(Paths.get(indexPath));
//analyzer with the default stop words
Analyzer analyzer = new StandardAnalyzer();
//IndexWriter Configuration
IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
//IndexWriter writes new index files to the directory
IndexWriter writer = new IndexWriter(dir, iwc);
//Its recursive method to iterate all files and directories
indexDocs(writer, docDir);
writer.close();
} catch (IOException e) {
e.printStackTrace();
}
}
static void indexDocs(final IndexWriter writer, Path path) throws IOException {
//Directory?
if (Files.isDirectory(path)) {
//Iterate directory
Files.walkFileTree(path, new SimpleFileVisitor<Path>() {
@Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
try {
//Index this file
writeToIndex(writer, file, attrs.lastModifiedTime().toMillis());
} catch (IOException ioe) {
ioe.printStackTrace();
}
return FileVisitResult.CONTINUE;
}
});
} else {
//Index this file
writeToIndex(writer, path, Files.getLastModifiedTime(path).toMillis());
}
}
static void writeToIndex(IndexWriter writer, Path file, long lastModified) throws IOException {
try (InputStream stream = Files.newInputStream(file)) {
//Create lucene Document
Document doc = new Document();
doc.add(new StringField("path", file.toString(), Field.Store.YES));
doc.add(new LongPoint("modified", lastModified));
doc.add(new TextField("contents", new String(Files.readAllBytes(file)), Store.YES));
//Updates a document by first deleting the document(s)
//containing <code>term</code> and then adding the new
//document. The delete and then add are atomic as seen
//by a reader on the same index
System.out.println("Writing file : " + file.toString());
writer.updateDocument(new Term("path", file.toString()), doc);
}
}
}
2. Searching in Lucene Indexes
To search for anything in the Lucene indexes, we use org.apache.lucene.search.IndexSearcher and its search() method. The QueryParser helps in creating the Query object from the input text to search.
import java.io.IOException;
import java.nio.file.Paths;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
public class LuceneReadIndexFromFileExample {
//directory contains the lucene indexes
private static final String INDEX_DIR = "c:/temp/lucene/indexedFiles";
private static String textToSearch = "agreeable";
public static void main(String[] args) throws Exception {
//Create lucene searcher. It searches over a single IndexReader.
IndexSearcher searcher = createSearcher();
//Search indexed contents using search term
TopDocs foundDocs = searchInContent(textToSearch, searcher);
//Total found documents
System.out.println("Total Results :: " + foundDocs.totalHits);
//Let's print out the path of files which have searched term
for (ScoreDoc sd : foundDocs.scoreDocs) {
Document d = searcher.doc(sd.doc);
System.out.println("Path : " + d.get("path") + ", Score : " + sd.score);
}
}
private static TopDocs searchInContent(String textToFind, IndexSearcher searcher) throws Exception {
//Create search query
QueryParser qp = new QueryParser("contents", new StandardAnalyzer());
Query query = qp.parse(textToFind);
//search the index
TopDocs hits = searcher.search(query, 10);
return hits;
}
private static IndexSearcher createSearcher() throws IOException {
Directory dir = FSDirectory.open(Paths.get(INDEX_DIR));
//It is an interface for accessing a point-in-time view of a lucene index
IndexReader reader = DirectoryReader.open(dir);
//Index searcher
IndexSearcher searcher = new IndexSearcher(reader);
return searcher;
}
}
3. Demo
Let’s create 3 files in a folder inputFiles
with the following content: data1.txt, data2.txt and data3.txt.
Society excited by cottage private an it esteems. Fully begin on by wound an. Girl rich in do up or both. At declared in as rejoiced of together. He impression collecting delightful unpleasant by prosperous as on. End too talent she object mrs wanted remove giving.
Questions explained agreeable preferred strangers too him her son. Set put shyness offices his females him distant. Improve has message besides shy himself cheered however how son. Quick judge other leave ask first chief her. Indeed or remark always silent seemed narrow be. Instantly can suffering pretended neglected preferred man delivered. Perhaps fertile brandon do imagine to cordial cottage.
Or neglected agreeable of discovery concluded oh it sportsman. Week to time in john. Son elegance use weddings separate. Ask too matter formed county wicket oppose talent. He immediate sometimes or to dependent in. Everything few frequently discretion surrounded did simplicity decisively. Less he year do with no sure loud.
Now, run the LuceneWriteIndexFromFileExample using it’s main()
method. Verify that Lucene indexes are created in indexedFiles folder.

Now, let’s say we want to search documents containing the word “agreeable“. Change the search term in variable “textToSearch” of the class LuceneReadIndexFromFileExample. Execute the class using it’s main() method. Verify the output:
Total Results :: 2
Path : inputFiles\data3.txt, Score : 0.47632512
Path : inputFiles\data2.txt, Score : 0.38863274
Search more terms and verify them yourselves.
Happy Learning !!
Comments