Masking Sensitive Data with Logback

Masking sensitive data in logback logs is done by partially or fully replacing the client-sensitive data or NPI (nonpublic personal information) with some arbitrary encoded text. For example, the SSN information can be replaced with all star characters or we can remove the complete SSN information from the logs.

1. Masking NPI in Logs

Generally, we can mask sensitive data in two ways.

The first approach (not recommended) is creating a few utility functions that create masked string representation of domain objects having sensitive information.

Logger.info("Transaction completed with details : " + CommonUtils.mask(trasaction));

This approach is problematic because masking calls are scattered over all the application code. In the future, we are asked to mask data only in the production and pre-production environments then we may be changing the code in multiple places.

Similarly, if we identified that we missed one domain object from the masking process, then we may need to change the code in many places and many log statements.

The second approach is separating the masking logic from application code and putting this in Logback configuration. Now, the change in the masking logic will be central to the configuration file and layout handler. Application classes will not participate in any kind of masking logic.

Any change in masking logic or scope must be handled by the Logback through layout handler classes and configuration files. This option can easily be managed and this should be the preferred way of data masking in logs.

2. How to Mask Data with Logback

The data masking in Logback is done in two steps:

  1. Define the masking patterns with the help of regular expressions in logback.xml configuration file.
  2. Define a custom Layout class that will read the masking patterns and apply those pattern regex on the log message.

2.1. Masking Patterns in Configuration File

This is a slightly difficult part where you will be writing the regex pattern for information to be masked. Writing regular expressions to cover all kinds of formatted outputs may not be so easy, but once it is done you will thank yourself later.

Following is such a configuration to log the mask data using the console appender (for demo) and it masks only the email and SSN fields.

<appender name="DATA_MASK" class="ch.qos.logback.core.ConsoleAppender">
    <encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
       <layout class="com.howtodoinjava.demo.logback.DataMaskingPatternLayout">
       <maskPattern>((?!000|666)[0-8][0-9]{2}-(?!00)[0-9]{2}-(?!0000)[0-9]{4})</maskPattern> <!-- SSN -->
       <maskPattern>(\w+@\w+\.\w+)</maskPattern> <!-- Email -->
       <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
       </layout>
    </encoder>
</appender>

Note that we can easily enable or disable the masking in a particular environment by using the if-else like condition of Janino library.

<dependency>
    <groupId>org.codehaus.janino</groupId>
    <artifactId>janino</artifactId>
    <version>3.1.6</version>
</dependency>

In the given example, we have enabled data masking in the production environment and disabled it in all other environments. The ENV is a system property that returns the environment name where the application is running.

<if condition='property("ENV").equals("prod")'>
	<then>
	<appender name="DATA_MASK" class="ch.qos.logback.core.ConsoleAppender">
        <encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
           <layout class="com.howtodoinjava.demo.logback.DataMaskingPatternLayout">
		       <maskPattern>((?!000|666)[0-8][0-9]{2}-(?!00)[0-9]{2}-(?!0000)[0-9]{4})</maskPattern> <!-- SSN -->
		       <maskPattern>(\w+@\w+\.\w+)</maskPattern> <!-- Email -->
		       <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
           </layout>
        </encoder>
    </appender>
  </then>
  <else>
  	<appender name="DATA_MASK" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
			<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
		</encoder>
    </appender>
  </else>
</if>

2.2. Custom PatternLayout

The second part of the solution is to read the masking patterns from the configuration and apply them in the log messages. This is rather a simple approach and can be achieved with a custom pattern handler.

The given pattern handler created a single regular expression by combining all patterns from the configuration and using OR operator. This pattern is applied to all log messages that need to be processed by this pattern handler.

We can customize the logic implemented in this handler to meet our own requirements.

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

import ch.qos.logback.classic.PatternLayout;
import ch.qos.logback.classic.spi.ILoggingEvent;

public class DataMaskingPatternLayout extends PatternLayout 
{
	private Pattern aplpliedPattern;
	private List<String> maskPatterns = new ArrayList<>();

	public void addMaskPattern(String maskPattern) {
		maskPatterns.add(maskPattern);
		aplpliedPattern = Pattern.compile( maskPatterns.stream()
					.collect(Collectors.joining("|")), Pattern.MULTILINE);
	}

	@Override
	public String doLayout(ILoggingEvent event) {
		return maskMessage(super.doLayout(event));
	}

	private String maskMessage(String message) {
		//When masking is disabled in a environment
		if (aplpliedPattern == null) {
			return message;
		}
		StringBuilder sb = new StringBuilder(message);
		Matcher matcher = aplpliedPattern.matcher(sb);
		while (matcher.find()) {
			IntStream.rangeClosed(1, matcher.groupCount()).forEach(group -> {
				if (matcher.group(group) != null) {
					IntStream.range(matcher.start(group), 
								matcher.end(group)).forEach(i -> sb.setCharAt(i, '*'));
				}
			});
		}
		return sb.toString();
	}
}

3. Demo

Let us see the data masking in action. I will be executing the demo code in production and non-production mode, both.

In non-production mode, we are not setting the system property ENV so data masking will not happen.

Logger logger = LoggerFactory.getLogger(Main.class);

Map<String, String> customer = new HashMap<String, String>();
customer.put("id", "12345");
customer.put("ssn", "856-45-6789");
customer.put("email", "admin@email.com");

logger.info("Customer found : {}", new JSONObject(customer));
21:02:18.683 [main] INFO  com.howtodoinjava.demo.slf4j.Main - Customer found : {"id":"12345","email":"admin@email.com","ssn":"856-45-6789"}

When we run the application in production mode, we can see the masked output.

//Production mode ON
System.setProperty("ENV", "prod");

Logger logger = LoggerFactory.getLogger(Main.class);

Map<String, String> customer = new HashMap<String, String>();
customer.put("id", "12345");
customer.put("ssn", "856-45-6789");
customer.put("email", "admin@email.com");

logger.info("Customer found : {}", new JSONObject(customer));
21:03:07.960 [main] INFO  com.howtodoinjava.demo.slf4j.Main - Customer found : {"id":"12345","email":"***************","ssn":"***********"}

4. Conclusion

In this Logback tutorial, we learned to create custom PatternLayout to mask the sensitive data from application logs. The data masking patterns are centrally controlled from the configuration file and that makes this technique so useful.

We can extend this feature to make environment specific masking by the use of conditional tags from Janino library that Logback supports implicitly.

Happy Learning !!

Download Sourcecode

Was this post helpful?

Join 7000+ Awesome Developers

Get the latest updates from industry, awesome resources, blog updates and much more.

* We do not spam !!

Leave a Comment

HowToDoInJava

A blog about Java and related technologies, the best practices, algorithms, and interview questions.