Try it yourself with our free Html Entity Encoder tool — runs entirely in your browser, no signup needed.

How to HTML decode in TypeScript

How to HTML decode in TypeScript

HTML decoding is the process of converting HTML entities to their corresponding characters. This is a crucial step in web development, as it allows us to display user-generated content safely and correctly. In this article, we will explore how to HTML decode in TypeScript, including a quick example, a step-by-step breakdown, handling edge cases, common mistakes, performance tips, and frequently asked questions.

Quick Example

import { JSDOM } from 'jsdom';

function htmlDecode(html: string): string {
  const dom = new JSDOM(html);
  return dom.window.document.body.textContent;
}

console.log(htmlDecode('&lt;p&gt;Hello, world!&lt;/p&gt;')); // Output: "<p>Hello, world!</p>"

This code uses the jsdom library to create a DOM from the HTML string, and then extracts the text content of the body element.

Step-by-Step Breakdown

Let's walk through the code line by line:

  • import { JSDOM } from 'jsdom';: We import the JSDOM class from the jsdom library. This library allows us to create a DOM from a string of HTML.
  • function htmlDecode(html: string): string {: We define a function htmlDecode that takes a string of HTML as input and returns the decoded string.
  • const dom = new JSDOM(html);: We create a new instance of the JSDOM class, passing the input HTML string to the constructor. This creates a DOM from the HTML string.
  • return dom.window.document.body.textContent;: We extract the text content of the body element from the DOM and return it as the decoded string.

Handling Edge Cases

Here are a few edge cases to consider:

Empty/null input

console.log(htmlDecode('')); // Output: ""
console.log(htmlDecode(null)); // Output: null

In this case, we can simply return an empty string or null, depending on the desired behavior.

Invalid input

console.log(htmlDecode('<script>alert("XSS")</script>')); // Output: ""

In this case, we can use a library like DOMPurify to sanitize the input and prevent XSS attacks.

Large input

const largeHtml = Array(10000).fill('<p>Hello, world!</p>').join('');
console.log(htmlDecode(largeHtml)); // Output: "<p>Hello, world!</p><p>Hello, world!</p>..."

In this case, we can use a streaming approach to process the input in chunks, rather than loading the entire string into memory at once.

Unicode/special characters

console.log(htmlDecode('&#x202f;')); // Output: "‎"

In this case, we can use a library like he to decode the Unicode characters correctly.

Common Mistakes

Here are a few common mistakes developers make when HTML decoding in TypeScript:

Mistake 1: Using a regex to decode HTML

function htmlDecode(html: string): string {
  return html.replace(/&amp;/g, '&').replace(/&lt;/g, '<').replace(/&gt;/g, '>');
}

This approach is incorrect because it only replaces a few specific HTML entities, and does not handle all possible cases.

Corrected code:

function htmlDecode(html: string): string {
  const dom = new JSDOM(html);
  return dom.window.document.body.textContent;
}

Mistake 2: Not handling null/undefined input

function htmlDecode(html: string): string {
  const dom = new JSDOM(html);
  return dom.window.document.body.textContent;
}

This code will throw an error if the input is null or undefined.

Corrected code:

function htmlDecode(html: string): string {
  if (html == null) {
    return '';
  }
  const dom = new JSDOM(html);
  return dom.window.document.body.textContent;
}

Mistake 3: Not sanitizing input

function htmlDecode(html: string): string {
  const dom = new JSDOM(html);
  return dom.window.document.body.textContent;
}

This code is vulnerable to XSS attacks.

Corrected code:

import { DOMPurify } from 'dompurify';

function htmlDecode(html: string): string {
  const sanitizedHtml = DOMPurify.sanitize(html);
  const dom = new JSDOM(sanitizedHtml);
  return dom.window.document.body.textContent;
}

Performance Tips

Here are a few performance tips for HTML decoding in TypeScript:

  • Use a streaming approach to process large inputs in chunks, rather than loading the entire string into memory at once.
  • Use a library like DOMPurify to sanitize the input and prevent XSS attacks.
  • Avoid using regex to decode HTML, as it can be slow and incorrect.

FAQ

Q: What is HTML decoding?

A: HTML decoding is the process of converting HTML entities to their corresponding characters.

Q: Why do I need to HTML decode?

A: You need to HTML decode to display user-generated content safely and correctly.

Q: What libraries can I use for HTML decoding in TypeScript?

A: You can use libraries like jsdom and DOMPurify.

Q: How do I handle edge cases like empty/null input?

A: You can return an empty string or null, depending on the desired behavior.

Q: How do I sanitize input to prevent XSS attacks?

A: You can use a library like DOMPurify to sanitize the input.

AI agent tools available. The CodeTidy MCP Server gives Claude, Cursor, and other AI agents access to 60+ developer tools. One command: npx @codetidy/mcp