Elasticsearch with Spring Data and Spring Boot 3

In this tutorial, we’ll explore the basics of Elasticsearch with Spring Boot with a hands-on and practical approach. We’ll learn to create an index, make CRUD operations, search, and query documents in Elasticsearch using Spring Data Elasticsearch module. We will also look at how to log Elasticsearch request/response data in our Spring Application.

1. Introduction to Elasticsearch

Elasticsearch is a powerful and widely-used open-source, Lucene-based search and analytics engine. It is designed for storing, searching, and analyzing large volumes of data quickly and in near real-time.

Below are some of the key features of Elasticsearch:

Full-Text Search: Elasticsearch excels at full-text search. It’s capable of indexing and searching through large amounts of unstructured or semi-structured text data efficiently.
JSON Documents: Data in Elasticsearch is stored in the form of JSON documents. Each document is stored in an index, which is conceptually similar to a database table in the relational database world. This makes Elasticsearch schema-less, allowing us to index and search data without a predefined structure.
Querying: Elasticsearch offers a rich and flexible query language that allows us to perform simple to complex searches. We can filter, aggregate, and sort data using various query types, such as term queries, match queries, range queries, and more.
Analyzers and Tokenizers: Elasticsearch includes powerful text analysis features like analyzers and tokenizers that can break down text into tokens for efficient search and indexing. It supports multiple languages and custom analyzers.

2. Setting Up an Elasticsearch Cluster

2.1. Using Elasticsearch Distribution

Download the official Elasticsearch distribution from the Elastic website and extract the downloaded archive. Navigate to the Elasticsearch directory and run the Elasticsearch executable (elasticsearch.bat on Windows, or elasticsearch on macOS and Linux).

Elasticsearch will start running on your local machine.

2.2. Using Docker

We can download and start Elasticsearch as a Docker container as a single node by running the following command:

docker run --name elasticsearch-container -d -p 9200:9200 -p 9300:9300 /
-e "discovery.type=single-node" -e "xpack.security.enabled=false" /
docker.elastic.co/elasticsearch/elasticsearch:8.10.4

For demo purposes, we will install Elasticsearch using Docker. The following docker-compose.yml creates a single-node Elasticsearch server. We expose the server at port ‘9200‘ so that our application can connect to it.

version: '3'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.10.4
    container_name: elasticsearch-container
    ports:
      - 9200:9200
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false

Elasticsearch 8 comes with SSL/TLS enabled by default, to disable the security we use the environment variable “xpack.security.enabled=false”. If security remains enabled, configuring the Elasticsearch client will require setting up an SSL connection.

Once the server is up, hit the URL – http://localhost:9200/, we should see something like the below –

{
  "name": "992e6b8bf7a5",
  "cluster_name": "docker-cluster",
  "cluster_uuid": "RxXotwWrTd2lzQBJmQ5gqA",
  "version": {
    "number": "8.10.4",
    "build_flavor": "default",
    "build_type": "docker",
    "build_hash": "f8edfccba429b6477927a7c1ce1bc6729521305e",
    "build_date": "2023-06-05T21:32:25.188464208Z",
    "build_snapshot": false,
    "lucene_version": "9.6.0",
    "minimum_wire_compatibility_version": "7.17.0",
    "minimum_index_compatibility_version": "7.0.0"
  },
  "tagline": "You Know, for Search"
}

This confirms that our Elasticsearch server is configured properly.

3. Setting Up Elasticsearch with Spring Data

3.1. Maven

To use Elasticsearch with the Spring Boot application we add spring-data-elasticsearch dependency.

<dependency>
    <groupId>org.springframework.data</groupId>
    <artifactId>spring-data-elasticsearch</artifactId>
    <version>5.1.5</version>
</dependency>

We also need spring-boot-docker-compose, testcontainers, junit-jupiter, and elasticsearch dependencies for writing unit and integration tests.

<dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-docker-compose</artifactId>
      <optional>true</optional>
</dependency>
<dependency>
      <groupId>org.testcontainers</groupId>
      <artifactId>testcontainers</artifactId>
      <version>1.19.1</version>
      <scope>test</scope>
</dependency>
<dependency>
      <groupId>org.testcontainers</groupId>
      <artifactId>junit-jupiter</artifactId>
      <version>1.19.1</version>
      <scope>test</scope>
</dependency>
    <dependency>
      <groupId>org.testcontainers</groupId>
      <artifactId>elasticsearch</artifactId>
      <version>1.19.1</version>
      <scope>test</scope>
</dependency>

3.2. Elasticsearch ClientConfiguration

Spring Data Elasticsearch relies on an Elasticsearch client (provided by Elasticsearch client libraries) connected to either a single Elasticsearch node or a cluster.

3.2.1. Imperative (non-reactive) Client

The below configuration shows how to use an imperative client.

@Configuration
public class ImperativeElasticsearchConfig extends ElasticsearchConfiguration {

    @Override
    public ClientConfiguration clientConfiguration() {

        return ClientConfiguration.builder()
                .connectedTo("localhost:9200")
                .build();
    }
}

After the above configuration, we can inject the following beans into other Spring components:

import org.springframework.data.elasticsearch.core.ElasticsearchOperations;
import co.elastic.clients.elasticsearch.ElasticsearchClient;
import org.elasticsearch.client.RestClient;

@Autowired
ElasticsearchOperations operations;

@Autowired
ElasticsearchClient elasticsearchClient;

@Autowired
RestClient restClient;

3.2.2. Reactive Client

When working with the Elasticesearch on a reactive stack, the below configuration must be used:

@Configuration
public class ReactiveElasticsearchConfig extends ReactiveElasticsearchConfiguration {

    @Override
    public ClientConfiguration clientConfiguration() {
        return ClientConfiguration.builder()
                .connectedTo("localhost:9200")
                .build();
    }
}

After the above configuration, we can inject the following beans into other Spring components:

import org.springframework.data.elasticsearch.core.ReactiveElasticsearchOperations;
import org.springframework.data.elasticsearch.client.elc.ReactiveElasticsearchClient;
import org.elasticsearch.client.RestClient;

@Autowired
ReactiveElasticsearchOperations operations;

@Autowired
ReactiveElasticsearchClient elasticsearchClient;

@Autowired
RestClient restClient;

Important: We should use the ElasticsearchOperations or the ReactiveElasticsearchOperations beans to interact with the Elasticsearch cluster.

When we use the repositories, these instance are used under the hood as well.

4. Elasticsearch Document Annotations

Spring Data Elasticsearch supports the use of Java domain classes as entities that can be mapped to Elasticsearch documents. These classes are annotated with @Document, and their fields are annotated with @Field annotations for mapping to Elasticsearch fields.

For our demo, we will create an Employee document that possesses properties such as name and salary as shown below:

@Data
@Document(indexName = "employees", createIndex = true)
public class Employee {

    @Id
    private String employeeId;

    @Field(type = FieldType.Text, name = "name")
    private String name;

    @Field(type = FieldType.Long, name = "salary")
    private long salary;
    
    // setters and getters
}

Now, let’s understand these annotations:

@Document – Applied at the class level to signify that this class is a candidate for database mapping.
- indexName – the name of the index to store this entity in.
- createIndex – flag decides whether to create an index on repository bootstrapping. The default value is true.
@Id – Applied at the field level to mark the field as the primary key.
@Field – Applied at the field level and defines properties of the field such as name, type, format, etc.

5. Spring Data Elasticsearch APIs

Spring Data Elasticsearch provides us with convenient abstractions and templates for interacting with Elasticsearch. It provides higher-level abstractions and data access methods so we do need to struggle with lower-level APIs.

ElasticsearchOperations (Recommended) contains common helper functions, the other methods have been moved to the different interfaces that are extended by ElasticsearchOperations. The interfaces now reflect the REST API structure of Elasticsearch.
- DocumentOperations are the functions related to saving, or deleting.
- SearchOperations contains the functions to search in Elasticsearch.
- IndexOperations define the functions to operate on indexes, like index creation or mappings creation.
ElasticsearchRepository: is a repository interface that extends the standard Spring Data CrudRepository. We can create custom repositories by defining the custom methods and queries.
ElasticsearchTemplate is a high-level API that provides various methods for interacting with Elasticsearch. It allows us to perform index, query, and document operations.

In later sections, we will delve into the details of using these APIs.

6. Using ElasticsearchOperations

After the imperative client configuration has been done, we can autowire the ElasticsearchOperations bean into a service class, and use its methods for performing the CRUD operations.

@Autowired
private ElasticsearchOperations elasticsearchOperations;

5.1. Create, Update, Get and Delete Documents

We can use the conventional save(), get() or delete() methods for performing the operations against an Elasticsearch document.

The save() operation has save-or-update semantics.
The get() method retrieves a document from the Elasticsearch index. To retrieve the desired document, the primary key must be provided.
The delete() method allows the deletion of a document from an index.

Employee employee = new Employee();
employee.setName("John");
employee.setSalary(20000);

//Create
Employee savedEmployee = elasticsearchOperations.save(employee);

//Update
savedEmployee.setName("John Doe");
Employee updatedEmployee = elasticsearchOperations.save(savedEmployee);

//GET
Employee fetchedEmployee = elasticsearchOperations.get(savedEmployee.getEmployeeId(), Employee.class);

//DELETE
elasticsearchOperations.delete(fetchedEmployee .getEmployeeId(), Employee.class);

5.3. Search and Query Documents

The SearchOperations interface takes a Query parameter that defines a filter to execute for searching. The Query is an interface that has three implementations: CriteriaQuery, StringQuery and NativeQuery.

5.3.1. CriteriaQuery

CriteriaQuery-based queries enable developers to effortlessly create search queries, even without a deep understanding of Elasticsearch query syntax or fundamentals. These queries allow users to construct search criteria by chaining and combining Criteria objects, specifying the requirements that documents must meet for the search.

The following code uses CriteriaQuery to fetch all the employees with salaries between two given salaries:

Criteria criteria = new Criteria("salary")
    .greaterThan(startingSalary)
    .lessThan(endingSalary);

Query query = new CriteriaQuery(criteria);
SearchHits<Employee> searchHits = elasticsearchOperations.search(query, Employee.class);

List<Employee> employeeList = searchHits.getSearchHits()
    .stream()
    .map(SearchHit::getContent)
    .toList();

5.3.2. StringQuery

The StringQuery requires developers to have a solid understanding of how to write elasticsearch queries. This class takes an Elasticsearch query as a JSON String.

The following code shows a query that searches for employees with the given name:

Query query = new StringQuery("{ \"match\": { \"name\": { \"query\": \"" + name + "\" } } } ");

SearchHits<Employee> searchHits = elasticsearchOperations.search(query, Employee.class);

List<Employee> employeeList = searchHits.getSearchHits()
    .stream()
    .map(SearchHit::getContent)
    .toList();

5.3.3. NativeQuery

When confronted with a complex query or one that cannot be expressed through the Criteria API, the NativeQuery class emerges as the ideal choice. This is particularly true when crafting queries with aggregates. NativeQuery facilitates the use of a diverse array of Elasticsearch library’s co.elastic.clients.elasticsearch._types.query_dsl. Query implementations, thus aptly earning its “native” designation.

Query query = NativeQuery.builder()
  .withQuery(q -> q
    .match(m -> m
      .field("salary")
      .query(salary)
    )
  )
  .build();

SearchHits<Employee> searchHits = elasticsearchOperations.search(query, Employee.class);

List<Employee> employeeList = searchHits.getSearchHits()
    .stream()
    .map(SearchHit::getContent)
    .toList();

6. Using ElasticsearchRepository

By extending the ElasticsearchRepository interface, we can leverage its methods to perform various operations on an Elasticsearch index.

public interface EmployeeRepository extends ElasticsearchRepository<Employee, String> {
}

Then, we autowire the interface in our service class.

@Autowire
private EmployeeRepository employeeRepository;

The interface provides us with the save(…), findById(…), findAll(…), deleteById(…), deleteAll(…) and many more methods to interact with our elasticsearch index and document.

6.1. Create, Update, Get and Delete Documents

Let’s look at how to use these repository methods:

Employee employee = new Employee();
employee.setName("Clark");
employee.setSalary(40000);

//CREATE
Employee savedEmployee = employeeRepository.save(employee);

//UPDATE
savedEmployee.setName("Clark Kent");
Employee updatedEmployee = employeeRepository.save(savedEmployee);

// GET
Employee fetchedEmployee = employeeRepository.findById(updatedEmployee.getEmployeeId());

//DELETE
employeeRepository.deleteById(fetchedEmployee.getEmployeeId());

6.3. Search/Query Documents

6.3.1 Named Query

We can define custom query methods in the repository interface by following Spring Data’s naming conventions. The named queries translate the method names into the Elasticsearch JSON query, as shown below:

public interface EmployeeRepository extends ElasticsearchRepository<Employee, String> {

    List<Employee> findBySalaryBetween(Long startingSalary, Long endingSalary);
}

The above method will translated to:

{ 
   "query" : {
     "bool" : {
        "must" : [
          { "range" : { "salary" : { "from": startingSalary, "to": endingSalary, "include_lower": true, "include_upper": true } } }
        ]
     }
   }
}

6.3.2. @Query Annotation

We can use @Query annotation to pass a valid Elasticsearch JSON query as a string. The arguments passed to the method can be inserted into placeholders in the query string. The placeholders are of the form ?0, ?1, ?2 etc. for the first, second, third parameter, and so on.

public interface EmployeeRepository extends ElasticsearchRepository<Employee, String> {

    @Query("{\"match\": {\"salary\": {\"query\": \"?0\"}}}")
    Page<Employee> findBySalary(Long salary);
}

7. Index Creation

There are two ways to create the index in Spring Data Elasticsearch:

7.1. Creating Index Automatically

To automatically create indices, we use the ‘createIndex‘ argument in the ‘@Document‘ annotation. If this argument is set to true (default value) Spring Data Elasticsearch, during bootstrapping the repository on application startup, will check if the index defined exists in the DB or not.

@Document(indexName = "employees", createIndex = true)
public class Employee {
  
  //...
}

If it does not exist, the index will be created and the fields derived from the entities will be written to the newly created index.

Remember that if we set createIndex to false, we need to take care of index creation and management programmatically.

7.2. Creating Index Programmatically

The IndexOperations interface, obtained from an ElasticsearchOperations instance, can be used to invoke the operations.indexOps(clazz) method. This gives the user the ability to create indices, put mappings, or store template and alias information in the Elasticsearch cluster.

elasticsearchOperations.indexOps(Employee.class).create();

Note that, when using IndexOperations or ElasticsearchOperations, it is the user’s responsibility to call the methods. None of these operations are done automatically.

8. Pagination and Sorting

Spring Data’s PagingAndSortingRepository interface adds additional methods to ease paginated access to entities. We can enable Pagination and Sort for any query by simply passing either Sort class or Pageable interface:

public interface EmployeeRepository extends ElasticsearchRepository<Employee, String> {

    @Query("{\"match\": {\"salary\": {\"query\": \"?0\"}}}")
    Page<Employee> findBySalary(Long salary, Pageable pageable);
}

Then we can paginate the results as follows:

Sort sortBy = Sort.by(Sort.Order.asc("salary"));
Pageable pageable = PageRequest.of(0,10, sortBy);

List<Employee> list = employeeRepository.findBySalary(20000L, pageable).getContent();

9. FAQs

9.1. Request and Response Logging

The request tracer logging can also be enabled to log every request and corresponding response in ‘curl‘ format. That comes in handy when debugging.

In order to enable trace logs using logback, we have to add the latest versions of logback-core, logback-classic and slf4j-api dependencies in our pom.xml:

<dependency>
      <groupId>ch.qos.logback</groupId>
      <artifactId>logback-core</artifactId>
      <version>1.4.11</version>
</dependency>
<dependency>
      <groupId>ch.qos.logback</groupId>
      <artifactId>logback-classic</artifactId>
      <version>1.4.11</version>
</dependency>
<dependency>
      <groupId>org.slf4j</groupId>
      <artifactId>slf4j-api</artifactId>
      <version>2.0.9</version>
</dependency>

Then, we enable trace logging for the ‘tracer‘ package:

<logger name="tracer" level="TRACE" additivity="false">
    <appender-ref ref="STDOUT" />
</logger>

Optionally, we can also enable debug logs for RestClient class in our Logback configuration:

<logger name="org.elasticsearch.client.RestClient" level="DEBUG" additivity="false">
    <appender-ref ref="STDOUT" />
</logger>

Do note that this type of logging is expensive and should not be enabled at all times in production environments, but rather temporarily used only when needed.

9.2. Using Scroll For Big Result Set

Elasticsearch has a scroll API for getting big results set in chunks. This is internally used by Spring Data Elasticsearch to provide the implementations of the ‘SearchOperations.searchForStream()‘ method.

9.2.1. Scrolling Results using ElasticsearchOperations and SearchHitsIterator

Query searchQuery = NativeQuery.builder()
  .withQuery(q -> q
    .matchAll(ma -> ma))
  .withFields("salary")
  .withPageable(PageRequest.of(0, 10))
  .build();

SearchHitsIterator<Employee> stream = elasticsearchOperations
    .searchForStream(searchQuery, Employee.class);

List<Employee> employees = new ArrayList<>();
while (stream.hasNext()) {
  employees.add(stream.next().getContent());
}

stream.close();

9.2.2. Scrolling Results using ElasticsearchRepository

To use the Scroll API with repository methods, the return type must defined as Stream in the Elasticsearch Repository. Spring Data Elasticsearch will internally use the scroll methods from the ElasticsearchTemplate to handle the query.

public interface EmployeeRepository extends ElasticsearchRepository<Employee, String> {
    
Stream<Employee> findAllBySalary(Long salary);
}

It will find all employees with the specified salary and return them as a Stream. Now we can use this method as follows:

Stream<Employee> stream = employeeRepository.findAllBySalary(20000L);   

List<Employee> employees = stream.toList();

stream.close();

10. Conclusion

In this tutorial, we learned different ways to connect our Spring Boot application with Elasticsearch. We also learned how to create an index (both programmatically and automatically), make CRUD operations, write Criteria queries, native queries, named queries, and Elasticsearch JSON queries. We also learned how to use Scroll API to search and retrieve large data in chunks.

Happy Learning !!

Source Code on Github