Guide to Hibernate Search

Learn to configure full text and index-based searches in Hibernate using backends like Lucene, Elasticsearch or OpenSearch. Hibernate APIs, combined with full-text search engines, provide a very powerful solution for searching information in large applications with millions of records in each table.

In addition, Hibernate Search can easily be configured with other popular frameworks such as Quarkus and Spring boot to help interact with front-end applications.

1. Dependencies

In order to use Hibernate Search module, we will need at least two direct dependencies i.e. a mapper and a backend. The mapper extracts data from the domain model and maps it to indexable documents, and backend allows indexing and searching these documents.

We are using Lucene as the backend for this tutorial.

<dependency>
   <groupId>org.hibernate.search</groupId>
   <artifactId>hibernate-search-mapper-orm</artifactId>
   <version>6.1.4.Final</version>
</dependency>
<dependency>
   <groupId>org.hibernate.search</groupId>
   <artifactId>hibernate-search-backend-lucene</artifactId>
   <version>6.1.4.Final</version>
</dependency>

Provide hibernate-search-backend-elasticsearch for using Elasticsearch as backend dependency.

2. Basic Configuration

We can add the search related configurations in any of the hibernate ORM’s config file i.e. hibernate.properties, hibernate.cfg.xml or persistence.xml.

Though the default search configuration is good enough for most applications, we shall be configuring a few. Start with the physical path in the system where the indices will be written. By default, the backend will store indexes in the current working directory.

<property name="hibernate.search.backend.directory.root">
  c:/temp/lucene/
</property>

There are some more interesting configurations, we may consider.

# Set false to disable the Search
hibernate.search.enabled = true

# Set false to disable Search annotations on Entity classes
hibernate.search.mapping.process_annotations = true

# Lucene format to store indexes; Default is latest version.
hibernate.search.backend.lucene_version = LUCENE_8_1_1

# Internal thread pool to execute write operations
hibernate.search.backend.thread_pool.size = 4

# local-heap or local-filesystem
hibernate.search.backend.directory.type = local-filesystem

# auto, simple, mmap or nio
hibernate.search.backend.directory.filesystem_access.strategy = auto

# simple-filesystem, native-filesystem, single-instance or none
hibernate.search.backend.directory.locking.strategy = native-filesystem

# Document queues in case of high volume writes
hibernate.search.backend.indexing.queue_count = 4
hibernate.search.backend.indexing.queue_size = 1000

# Commit interval (in milliseconds)
hibernate.search.backend.io.commit_interval = 1000

3. Entity Annotations

3.1. @Indexed

In order to index an entity, it must be annotated with @Indexed.

The index name will be equal to the entity name. Use @Indexed(index = "IndexName") to choose another name.
Subclasses inherit the @Indexed annotation and will also be indexed by default.
Use @Indexed(enabled = false) to disable indexing of a subclass.

@Entity
@Table(name = "TBL_PRODUCT")
@Indexed
public class Product {
  //...
}

3.2. @DocumentId

By default, the entity identifier is used for the @Indexed document’s identifier. To select another field as the document identifier, use this annotation.

@NaturalId
@DocumentId
private String email;

3.3. @IndexedEmbedded

@IndexedEmbedded can be used on @Embedded properties as well as associations (@OneToOne, @OneToMany and others).

@ManyToMany
@IndexedEmbedded 
private List<Vendor> vendors = new ArrayList<>();

3.4. Field Annotations

Let us see the annotations that are applied to the entity fields.

@FullTextField: A text field whose value is considered as multiple words. Only works for String fields.
@GenericField: Fields mapped using this annotation do not provide any advanced features such as full-text search: matches on a generic field are exact matches.
@KeywordField: A text field whose value is considered as a single keyword. Only works for String fields.
@NonStandardField: This annotation is very useful for cases when a field type native to the backend is necessary.
@ScaledNumberField: A numeric field for integer or floating-point values with a fixed scale that is consistent for all values of the field across all documents.

Most of the above annotations support attributes for further customizing the indexing behavior for that field, such as, name, sortable, projectable, aggregable, searchable, searchAnalyzer, normalizer and a few more.

@FullTextField(analyzer = "english") 
private String title;

@FullTextField
private String features;

4. Schema Management at Application Start/Shutdown

We can control the index schema creation and updation programmatically as well as using the configuration.

To configure the behavior, we can use the property hibernate.search.schema_management.strategy and set one of the following values:

none: Don’t do anything at all.
validate: An exception will be thrown on startup if indexes are missing. Doesn’t create any schema.
create: Creates missing indexes and their schema on startup. Don’t check and validate the existing indexes.
create-or-validate: Creates missing indexes and their schema on startup, and validates the schema of existing indexes.
create-or-update: Creates missing indexes and their schema on startup, and updates the schema of existing indexes if possible.
drop-and-create: Drops existing indexes and re-creates them and their schema on startup.
drop-and-create-and-drop: Drops existing indexes and re-creates them on startup, then drops the indexes on shutdown.

To programmatically configure the behavior at application startup, SearchSchemaManager provides methods corresponding to the above configurations.

SearchSession searchSession = Search.session( entityManager ); 

SearchSchemaManager schemaManager = searchSession.schemaManager(); 
schemaManager.createIfMissing(); 

MassIndexer indexer = searchSession.massIndexer(Product.class)
    .threadsToLoadObjects(4);
indexer.startAndWait();

5. Indexing Documents

By default, every time an entity is changed through a Hibernate Session and if that entity is mapped to an index, Search module updates the relevant index automatically.

For example, hibernate detects all updates using session.persist(), session.update() and other methods. Any change to the indexable entity is also updated into the Lucene index.

Generally, these index updates happen when the updates are flushed into the database or the transaction is committed.

Note that changes done with JPQL or SQL queries are not tracked so these do not update the indexes. In this case, it is necessary to control indexing manually using the SearchIndexingPlan interface.

Note that the methods in SearchIndexingPlan only affect the Hibernate Search indexes: they do not write anything to the database.

This interface offers the following methods:

addOrUpdate(entity): Add or update a document in the index.
delete(entity): Delete a document from the index.
purge(entityClass, id): Delete the entity from the index. Compared to delete(), it is useful if the entity has already been deleted from the database.
purge(entityName, id): Delete entity by name.
process(entity): Immidiately process all changes without putting in queue.

6. Searching Documents

Hibernate Search provides high-level APIs to search indexed documents. Note that these APIs use indexes to perform the search, but to return entities loaded from the database.

For demo purposes, we are using the following entity for indexing and searching purposes.

@Data
@AllArgsConstructor
@Builder
@Entity
@Table(name = "TBL_PRODUCT")
@Indexed
public class Product {
  @Id
  @Column(name = "id", nullable = false)
  @GeneratedValue(strategy = GenerationType.SEQUENCE)
  private Long id;

  @KeywordField
  private String name;

  @KeywordField
  private String company;

  @FullTextField
  private String features;

  @GenericField
  private LocalDate launchedOn;

  public Product() {
  }
}

6.1. Search Syntax

Preparing and executing a search query requires creating SearchSession from EntityManager and then using its search() method to search documents based on provided Predicate.

SearchSession searchSession =
    Search.session(entityManager);

SearchResult<Product> result = searchSession.search(Product.class)
    .where(f -> f.match()
        .fields("name")
        .matching("iPhone 7"))
    .fetch(10);

long totalHitCount = result.total().hitCount();
List<Product> hits = result.hits();

Assertions.assertEquals(1, totalHitCount);
Assertions.assertEquals("iPhone 7", hits.get(0).getName());

In the above example, we are fetching only documents matching the given predicate that matches the field name to 'iPhone 7'.

Also, for some queries, we can get thousands of results that can overwhelm the application performance. So it is always recommended to limit the number of documents in the result using fetch(n) method.

The result.total().hitCount() returns the total number of documents in the index. We can use this information to build pagination with help of fetch( offset, limit ) method.

List<Product> hits = searchSession.search( Product.class )
        .where( f -> f.matchAll() )
        .fetchHits( 40, 20 );

If we are still determined to fetch all hits at once, use the method fetchAllHits().

List<Product> hits = searchSession.search( Product.class )
        .where( f -> f.id().matchingAny( Arrays.asList( 1, 2, 3, 4 ) ) )
        .fetchAllHits();

If we are expecting at most a single hit for a query, we can use fetchSingleHit() method. It will return either zero or one document (wrapped in an Optional). An exception will be thrown if more than one hit is returned.

Optional<Product> hit = searchSession.search( Product.class )
      .where( f -> f.id().matching( 1 ) )
      .fetchSingleHit();

6.2. Search Multiple Entities

To apply a search predicate on multiple entities, we can pass them as a List in the search() method.

In the given examples, Product and AnotherProduct types must implement the IProduct interface because the search will return the entities of types IProduct.

SearchResult<IProduct> result = searchSession.search(Arrays.asList( 
                Product.class, AnotherProduct.class
              ))....

It is possible to search using entity names as well.

SearchResult<Person> result = searchSession.search( 
  searchSession.scope( 
          IProduct.class,
          Arrays.asList( "Product", "AnotherProduct" )
  )
)....

6.3. Checking Total Hits

Sometimes we only want to check how many matching documents exist so that we can tweak our search criteria accordingly. We can use fetchTotalHitCount() method to retrieve only the matched documents count.

long totalHitCount = searchSession.search( Product.class )
      .where(f -> f.terms()
        .fields("features")
        .matching("Touchscreen"))
      .fetchTotalHitCount();

6.4. Matching Field Values

The match predicate matches documents for which a given field has a given value. By default, the match predicate expects arguments to the matching method to have the same type corresponding to the target field.

SearchResult<Product> result = searchSession.search(Product.class)
    .where(f -> f.match()
        .fields("name")
        .matching("iPhone 7"))
    .fetch(10);

To match multiple fields against the same value, we can use the field() method multiple times.

SearchResult<Product> result = searchSession.search(Product.class)
    .where(f -> f.match()
        .fields("name")
        .fields("features")
        .matching("iPhone 7"))
    .fetch(10);

Using the boost() method to denote which field matches weigh higher in comparison to others. A boost (multiplier) higher than 1 will increase its impact on the total document score.

SearchResult<Product> result = searchSession.search(Product.class)
    .where(f -> f.match()
        .fields("name").boost( 2.0f )
        .fields("features")
        .matching("iPhone 7"))
    .fetch(10);

6.5. Matching Multiple Terms

The terms predicate matches documents for which a given field contains some terms, any or all of them.

SearchResult<Product> result = searchSession.search(Product.class)
    .where(f -> f.terms()
        .fields("name")
        .matchingAny("iPhone", "iPad", "Apple"))
    .fetch(100);

Use matchingAll() to match all terms in the field.

6.6. Full-Text Searches

For full-text fields, the value passed to the matching() method is tokenized. This means multiple terms may be extracted from the input value, and the predicate will match all documents for each tokenized term.

The given example will match all documents that contain even a single word out of the given three words (iPhone, iPad or apple) in their features list.

SearchResult<Product> result = searchSession.search(Product.class)
    .where(f -> f.match()
        .fields("features")
        .matching("iPhone iPad apple"))
    .fetch(100);

6.7. Fuzzy Searches

The fuzzy() option allows for approximate matches. It matches the tokens with close values, for example with one letter that was switched for another.

It can have a number from 0 to 2. 2 is the default.

SearchResult<Product> result = searchSession.search(Product.class)
    .where(f -> f.match()
        .fields("features")
        .matching("iPhone iPad apple")
        .fuzzy(1))
    .fetch(100);

6.8. Matching Phrases

The phrase predicate matches documents for which a given field contains a given sequence of words, in the given order.

SearchResult<Product> result = searchSession.search(Product.class)
    .where(f -> f.phrase()
        .fields("featues")
        .matching("Fingerprint (front-mounted)")
        .fuzzy(1))
    .fetch(100);

6.9. Values in Range

The range predicate matches documents for which a given field has a value within a defined range or limit.

List<Product> hits = searchSession.search( Product.class )
        .where( f -> f.range().field( "price" )
                .between( 8000, 20000 ) )
        .fetchHits( 20 );

We can use between, atLeast, greaterThan, atMost and lessThan methods to provide the upper bound and lower bound values for matching.

6.10. Wildcard Matches

The wildcard predicate matches documents for which a given field contains a word matching the given pattern.

SearchResult<Product> result = searchSession.search(Product.class)
    .where(f -> f.match()
        .fields("name")
        .fields("features")
        .matching("iP****"))
    .fetch(10);

6.11. Regex Matches

The regexp predicate matches documents for which a given field contains a word matching the given regular expression.

SearchResult<Product> result = searchSession.search(Product.class)
    .where(f -> f.match()
        .fields("name")
        .fields("features")
        .matching("iP.*e"))
    .fetch(10);

6.12. Combining Predicates

The bool predicate is used to match documents against one or more inner predicates.

Use must(), mustNot(), filter() and should() methods to build logical AND and logical OR combinations between the predicates.

List<Product> hits = searchSession.search( Product.class )
        .where( f -> f.bool()
                .must( f.match().field( "name" )
                        .matching( "samsung" ) ) 
                .should( f.match().field( "features" )
                        .matching( "Touchscreen" ) ) 
        )
        .fetchHits( 20 );

7. Conclusion

In this tutorial, we learned to integrate the Hibernate Search module with Hibernate ORM. We learned to build the SearchSession instance and use it for searching by the given predicates and fetching the matching documents in various ways.

Refer to the official documentation for more detailed information.

Happy Learning !!

Sourcecode on Github