Try it yourself with our free Html Entity Encoder tool — runs entirely in your browser, no signup needed.

How to HTML encode in Java

How to HTML encode in Java

HTML encoding is the process of converting special characters in a string to their corresponding HTML entities, ensuring that the string can be safely displayed in a web browser without causing any security vulnerabilities or rendering issues. In Java, HTML encoding is crucial when displaying user-generated content or data retrieved from external sources, as it prevents cross-site scripting (XSS) attacks and ensures that the content is displayed correctly.

Quick Example

import org.apache.commons.text.StringEscapeUtils;

public class HtmlEncoder {
    public static String htmlEncode(String input) {
        return StringEscapeUtils.escapeHtml4(input);
    }

    public static void main(String[] args) {
        String input = "<script>alert('XSS')</script>";
        String encoded = htmlEncode(input);
        System.out.println(encoded); // Output: &lt;script&gt;alert(&#x27;XSS&#x27;)&lt;/script&gt;
    }
}

To use this example, add the Apache Commons Text dependency to your pom.xml file (if you're using Maven):

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-text</artifactId>
    <version>1.9</version>
</dependency>

Or, if you're using Gradle, add this to your build.gradle file:

dependencies {
    implementation 'org.apache.commons:commons-text:1.9'
}

Step-by-Step Breakdown

Let's walk through the code:

  • We import the StringEscapeUtils class from the Apache Commons Text library, which provides a convenient method for HTML encoding.
  • We define a static method htmlEncode that takes a String input and returns the HTML-encoded result.
  • Inside the htmlEncode method, we call the escapeHtml4 method from StringEscapeUtils, passing the input string as an argument. This method replaces special characters with their corresponding HTML entities.
  • In the main method, we demonstrate the usage of the htmlEncode method by encoding a malicious script tag and printing the result.

Handling Edge Cases

Empty/Null Input

To handle empty or null inputs, we can add a simple null check and return an empty string or a default value:

public static String htmlEncode(String input) {
    if (input == null || input.isEmpty()) {
        return "";
    }
    return StringEscapeUtils.escapeHtml4(input);
}

Invalid Input

If the input contains invalid characters, the escapeHtml4 method will still encode them correctly. However, if you need to validate the input before encoding, you can use a regular expression or a validation library.

Large Input

For large inputs, the escapeHtml4 method is designed to handle strings of any size. However, if you're working with extremely large strings, you may want to consider using a streaming approach to avoid loading the entire string into memory.

Unicode/Special Characters

The escapeHtml4 method correctly handles Unicode characters and special characters, replacing them with their corresponding HTML entities. For example:

String input = " café";
String encoded = htmlEncode(input);
System.out.println(encoded); // Output: &#x20;&#xA0;caf&#xE9;

Common Mistakes

Mistake 1: Using replaceAll instead of escapeHtml4

// Wrong
String encoded = input.replaceAll("<", "&lt;").replaceAll(">", "&gt;");

// Correct
String encoded = StringEscapeUtils.escapeHtml4(input);

Mistake 2: Failing to handle null inputs

// Wrong
String encoded = StringEscapeUtils.escapeHtml4(input);

// Correct
String encoded = input == null ? "" : StringEscapeUtils.escapeHtml4(input);

Mistake 3: Using an outdated library

// Wrong (using an outdated library)
import org.apache.commons.lang3.StringEscapeUtils;

// Correct (using the latest Apache Commons Text library)
import org.apache.commons.text.StringEscapeUtils;

Performance Tips

Tip 1: Use a caching layer

If you're encoding the same strings repeatedly, consider using a caching layer to store the encoded results and avoid redundant computations.

Tip 2: Use a streaming approach

For large inputs, use a streaming approach to encode the string in chunks, rather than loading the entire string into memory.

Tip 3: Avoid unnecessary encoding

Only encode strings that will be displayed in a web browser or used in a context where HTML entities are required. Avoid encoding strings that will be used in a non-HTML context.

FAQ

Q: What is the difference between escapeHtml4 and escapeHtml3?

A: escapeHtml4 is the recommended method for HTML encoding, as it provides better support for Unicode characters and is more secure than escapeHtml3.

Q: Can I use StringEscapeUtils for XML encoding?

A: No, StringEscapeUtils is designed specifically for HTML encoding. For XML encoding, use a dedicated XML library or a streaming approach.

Q: How do I decode HTML-encoded strings?

A: Use the StringEscapeUtils.unescapeHtml4 method to decode HTML-encoded strings.

Q: Is StringEscapeUtils thread-safe?

A: Yes, StringEscapeUtils is thread-safe and can be used concurrently by multiple threads.

Q: Can I use StringEscapeUtils with Java 8?

A: Yes, StringEscapeUtils is compatible with Java 8 and later versions.

AI agent tools available. The CodeTidy MCP Server gives Claude, Cursor, and other AI agents access to 60+ developer tools. One command: npx @codetidy/mcp