Try it yourself with our free Html Entity Encoder tool — runs entirely in your browser, no signup needed.

How to HTML encode in JavaScript

How to HTML Encode in JavaScript

HTML encoding is the process of converting special characters in a string to their corresponding HTML entities, ensuring that the string can be safely displayed in an HTML document without causing any parsing errors. In JavaScript, HTML encoding is crucial when working with user-input data, as it helps prevent cross-site scripting (XSS) attacks and ensures that the data is displayed correctly in the browser.

Quick Example

Here's a minimal example of how to HTML encode a string in JavaScript using the DOMPurify library:

import DOMPurify from 'dompurify';

const userInput = '<script>alert("XSS")</script>';
const encodedInput = DOMPurify.sanitize(userInput);

console.log(encodedInput); // Output: &lt;script&gt;alert(&quot;XSS&quot;)&lt;/script&gt;

To use this code, install DOMPurify using npm by running the command npm install dompurify in your terminal.

Step-by-Step Breakdown

Let's break down the code:

  1. import DOMPurify from 'dompurify';: We import the DOMPurify library, which provides a sanitize method for HTML encoding.
  2. const userInput = '<script>alert("XSS")</script>';: We define a variable userInput containing a string with special characters that need to be encoded.
  3. const encodedInput = DOMPurify.sanitize(userInput);: We pass the userInput string to the sanitize method, which returns the encoded string.
  4. console.log(encodedInput);: We log the encoded string to the console.

Handling Edge Cases

Here are some common edge cases to consider:

Empty/Null Input

const userInput = '';
const encodedInput = DOMPurify.sanitize(userInput);
console.log(encodedInput); // Output: ''

As expected, an empty input string returns an empty encoded string.

Invalid Input

const userInput = null;
const encodedInput = DOMPurify.sanitize(userInput);
console.log(encodedInput); // Output: ''

If the input is null, the sanitize method returns an empty string.

Large Input

const userInput = 'a'.repeat(10000);
const encodedInput = DOMPurify.sanitize(userInput);
console.log(encodedInput); // Output: a... (10000 times)

The sanitize method can handle large input strings without issues.

Unicode/Special Characters

const userInput = ' Café';
const encodedInput = DOMPurify.sanitize(userInput);
console.log(encodedInput); // Output: Café

The sanitize method correctly handles Unicode characters and special characters, such as accents and non-ASCII symbols.

Common Mistakes

Here are some common mistakes developers make when HTML encoding in JavaScript:

Mistake 1: Using a simple replace method

const userInput = '<script>alert("XSS")</script>';
const encodedInput = userInput.replace(/</g, '&lt;').replace(/>/g, '&gt;');
console.log(encodedInput); // Output: &lt;script&gt;alert(&quot;XSS&quot;)&lt;/script&gt; (INCORRECT)

This approach is incorrect because it only replaces < and > characters, leaving other special characters unencoded.

Corrected code:

const userInput = '<script>alert("XSS")</script>';
const encodedInput = DOMPurify.sanitize(userInput);
console.log(encodedInput); // Output: &lt;script&gt;alert(&quot;XSS&quot;)&lt;/script&gt;

Mistake 2: Using a custom encoding function

function customEncode(input) {
  return input.replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;');
}
const userInput = '<script>alert("XSS")</script>';
const encodedInput = customEncode(userInput);
console.log(encodedInput); // Output: &lt;script&gt;alert(&quot;XSS&quot;)&lt;/script&gt; (INCORRECT)

This approach is incorrect because it only replaces a limited set of special characters, leaving other characters unencoded.

Corrected code:

const userInput = '<script>alert("XSS")</script>';
const encodedInput = DOMPurify.sanitize(userInput);
console.log(encodedInput); // Output: &lt;script&gt;alert(&quot;XSS&quot;)&lt;/script&gt;

Mistake 3: Not encoding user input

const userInput = '<script>alert("XSS")</script>';
console.log(userInput); // Output: <script>alert("XSS")</script> (INCORRECT)

This approach is incorrect because it directly logs user input without encoding, making it vulnerable to XSS attacks.

Corrected code:

const userInput = '<script>alert("XSS")</script>';
const encodedInput = DOMPurify.sanitize(userInput);
console.log(encodedInput); // Output: &lt;script&gt;alert(&quot;XSS&quot;)&lt;/script&gt;

Performance Tips

Here are some performance tips for HTML encoding in JavaScript:

  1. Use a library: Instead of implementing a custom encoding function, use a well-maintained library like DOMPurify to handle HTML encoding.
  2. Avoid unnecessary encoding: Only encode user input when necessary, as encoding can introduce performance overhead.
  3. Use caching: If you need to encode the same input multiple times, consider caching the encoded result to avoid redundant encoding.

FAQ

Q: What is HTML encoding?

A: HTML encoding is the process of converting special characters in a string to their corresponding HTML entities.

Q: Why is HTML encoding important?

A: HTML encoding helps prevent cross-site scripting (XSS) attacks and ensures that user input is displayed correctly in the browser.

Q: Can I use a simple replace method for HTML encoding?

A: No, a simple replace method is not sufficient for HTML encoding, as it only replaces a limited set of special characters.

Q: How do I handle edge cases like empty or null input?

A: Use a library like DOMPurify to handle edge cases, as it provides robust encoding and sanitization functionality.

Q: Is HTML encoding a performance bottleneck?

A: HTML encoding can introduce performance overhead, but using a library like DOMPurify and caching encoded results can help mitigate this issue.

AI agent tools available. The CodeTidy MCP Server gives Claude, Cursor, and other AI agents access to 60+ developer tools. One command: npx @codetidy/mcp