Lucene MMapDirectory and ByteBuffersDirectory Example

Apache Lucene’s ByteBuffersDirectory is a new in-memory directory implementation added in Lucene 8.4.0. Internally, it uses Java NIO’s ByteBuffer for efficient read/write in the underlying RAM memory.

  • The ByteBuffersDirectory is quite useful for demo purposes that require fast, transient indexing and searching without persistent storage.
  • If you are looking for fast memory-based indexes for your production application then consider using MMapDirectory as as it uses OS caches more effectively (through memory-mapped buffers).

The previously used RAMDirectory has been deprecated and is not recommended for usage.

1. Maven

Start with adding these Lucene dependencies. We are using Lucene 9.10.0 and Java 21.

<properties> 
  <maven.compiler.source>21</maven.compiler.source>
  <maven.compiler.target>21</maven.compiler.target>
  <lucene.version>9.10.0</lucene.version>
</properties>

<dependency>
  <groupId>org.apache.lucene</groupId>
  <artifactId>lucene-core</artifactId>
  <version>${lucene.version}</version>
</dependency>
<dependency>
  <groupId>org.apache.lucene</groupId>
  <artifactId>lucene-analysis-common</artifactId>
  <version>${lucene.version}</version>
</dependency>
<dependency>
  <groupId>org.apache.lucene</groupId>
  <artifactId>lucene-queryparser</artifactId>
  <version>${lucene.version}</version>
</dependency>

2. Lucene ByteBuffersDirectory Example

The ByteBuffersDirectory class is an in-memory directory implementation. It stores the index files on the heap for quick access.

The following example indexes 4 documents with little content in them using indexDoc() method. Later, we search the term “happy” in the documents in the method searchIndex().

import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.ByteBuffersDirectory;


public class ByteBuffersDirectoryExample {
  public static void main(String[] args) throws IOException {

    //Create ByteBuffersDirectory instance
    ByteBuffersDirectory byteBufferDir = new ByteBuffersDirectory();

    //Builds an analyzer with the default stop words
    Analyzer analyzer = new StandardAnalyzer();

    //Write some docs to ByteBuffersDirectory
    writeIndex(byteBufferDir, analyzer);

    //Search indexed docs in ByteBuffersDirectory
    searchIndex(byteBufferDir, analyzer);
  }

  static void writeIndex(ByteBuffersDirectory byteBufferDir, Analyzer analyzer) {
    try {
      // IndexWriter Configuration
      IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
      iwc.setOpenMode(OpenMode.CREATE);

      //IndexWriter writes new index files to the directory
      IndexWriter writer = new IndexWriter(byteBufferDir, iwc);

      //Create some docs with name and content
      indexDoc(writer, "document-1", "hello world");
      indexDoc(writer, "document-2", "hello happy world");
      indexDoc(writer, "document-3", "hello happy world");
      indexDoc(writer, "document-4", "hello hello world");

      //don't forget to close the writer
      writer.close();
    } catch (IOException e) {
      //Any error goes here
      e.printStackTrace();
    }
  }

  static void indexDoc(IndexWriter writer, String name, String content) throws IOException {
    Document doc = new Document();
    doc.add(new TextField("name", name, Store.YES));
    doc.add(new TextField("content", content, Store.YES));
    writer.addDocument(doc);
  }

  static void searchIndex(ByteBuffersDirectory byteBufferDir, Analyzer analyzer) {

    String searchTerm = "happy";

    IndexReader reader = null;
    try {
      //Create Reader
      reader = DirectoryReader.open(byteBufferDir);

      //Create index searcher
      IndexSearcher searcher = new IndexSearcher(reader);

      //Build query
      QueryParser qp = new QueryParser("content", analyzer);
      Query query = qp.parse(searchTerm);

      //Search the index
      TopDocs foundDocs = searcher.search(query, 10);

      // Total found documents
      System.out.println("Total Results :: " + foundDocs.totalHits);

      //Let's print found doc names and their content along with score
      for (ScoreDoc sd : foundDocs.scoreDocs) {
        Document d = searcher.doc(sd.doc);
        System.out.println("Document Name : " + d.get("name")
            + "  :: Content : " + d.get("content")
            + "  :: Score : " + sd.score);
      }
      //don't forget to close the reader
      reader.close();
    } catch (IOException | ParseException e) {
      //Any error goes here
      e.printStackTrace();
    }
  }
}

The program output:

Total Results :: 2 hits
Document Name : document-2  :: Content : hello happy world  :: Score : 0.30376968
Document Name : document-3  :: Content : hello happy world  :: Score : 0.30376968

3. Lucene MMapDirectory Example

In Apache Lucene, another near-memory implementation is MMapDirectory. The MMapDirectory is a file-based directory implementation that uses memory-mapped files for storage. It takes advantage of the operating system’s virtual memory management to map files directly into memory thus providing fast access to index data.

The MMapDirectory is the best fit in usecases where we want to persist the indexes in the filesystem, and still want to take advantage of superfast access to indexes from memory.

In the code, there is hardly any difference between using a ByteBuffersDirectory or MMapDirectory. Both look exactly the same. The only difference is how we create the instance of MMapDirectory.

MMapDirectory directory = new MMapDirectory(Paths.get("/path/to/index"));

The following example uses the MMapDirectory for indexing and searching in the index files.

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.ByteBuffersDirectory;
import org.apache.lucene.store.MMapDirectory;

import java.io.IOException;
import java.nio.file.Path;


public class MMapDirectoryExample {
  public static void main(String[] args) throws IOException {

    MMapDirectory mmapDir = new MMapDirectory(Path.of("c:/temp", "lucene", "index"));
    Analyzer analyzer = new StandardAnalyzer();
    writeIndex(mmapDir, analyzer);
    searchIndex(mmapDir, analyzer);
  }

  static void writeIndex(MMapDirectory mmapDir, Analyzer analyzer) {
    try {
      IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
      iwc.setOpenMode(OpenMode.CREATE);
      IndexWriter writer = new IndexWriter(mmapDir, iwc);

      //Create some docs with name and content
      indexDoc(writer, "document-1", "hello world");
      indexDoc(writer, "document-2", "hello happy world");
      indexDoc(writer, "document-3", "hello happy world");
      indexDoc(writer, "document-4", "hello hello world");

      writer.close();
    } catch (IOException e) {
      e.printStackTrace();
    }
  }

  static void indexDoc(IndexWriter writer, String name, String content) throws IOException {
    Document doc = new Document();
    doc.add(new TextField("name", name, Store.YES));
    doc.add(new TextField("content", content, Store.YES));
    writer.addDocument(doc);
  }

  static void searchIndex(MMapDirectory mmapDir, Analyzer analyzer) {

    String searchTerm = "happy";

    IndexReader reader = null;
    try {
      reader = DirectoryReader.open(mmapDir);

      IndexSearcher searcher = new IndexSearcher(reader);
      QueryParser qp = new QueryParser("content", analyzer);
      Query query = qp.parse(searchTerm);

      TopDocs foundDocs = searcher.search(query, 10);
      System.out.println("Total Results :: " + foundDocs.totalHits);

      for (ScoreDoc sd : foundDocs.scoreDocs) {
        Document d = searcher.doc(sd.doc);
        System.out.println("Document Name : " + d.get("name")
            + "  :: Content : " + d.get("content")
            + "  :: Score : " + sd.score);
      }
      reader.close();
    } catch (IOException | ParseException e) {
      e.printStackTrace();
    }
  }
}

The program output:

Total Results :: 2 hits
Document Name : document-2  :: Content : hello happy world  :: Score : 0.30376968
Document Name : document-3  :: Content : hello happy world  :: Score : 0.30376968

4. Conclusion

As discussed in this Lucene tutorial, both ByteBuffersDirectory and MMapDirectory serve different purposes. The ByteBuffersDirectory is a great fit for fast, in-memory indexing and searching in demo applications that require transient indexes and test data. The MMapDirectory is suitable for production usecases where data needs to be persisted on disk, and we still require faster and more efficient in-memory type access to indexes.

Happy Learning !!

Source Code on Github

Comments

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments

About Us

HowToDoInJava provides tutorials and how-to guides on Java and related technologies.

It also shares the best practices, algorithms & solutions and frequently asked interview questions.