Try it yourself with our free Html Beautifier tool — runs entirely in your browser, no signup needed.

How to Format HTML in Node.js

How to format HTML in Node.js

Formatting HTML in Node.js is an essential task for any web developer, as it allows you to generate and manipulate HTML content programmatically. This can be useful for a variety of applications, such as generating dynamic web pages, creating email templates, or even scraping and parsing HTML content from other websites. In this article, we'll explore how to format HTML in Node.js using a popular library called jsdom, and provide practical tips and examples for common use cases.

Quick Example

Here's a minimal example that demonstrates how to format HTML in Node.js using jsdom:

const jsdom = require('jsdom');
const { JSDOM } = jsdom;

const html = '<p>Hello <span>world!</span></p>';
const dom = new JSDOM(html);
const formattedHtml = dom.serialize();

console.log(formattedHtml);
// Output: <p>Hello <span>world!</span></p>

This code creates a new JSDOM instance with the input HTML, and then uses the serialize() method to generate a formatted HTML string.

Step-by-Step Breakdown

Let's walk through the code line by line:

  1. const jsdom = require('jsdom');: We import the jsdom library, which provides a way to parse and manipulate HTML in Node.js.
  2. const { JSDOM } = jsdom;: We extract the JSDOM class from the jsdom library, which we'll use to create a new DOM instance.
  3. const html = '<p>Hello <span>world!</span></p>';: We define a sample HTML string that we want to format.
  4. const dom = new JSDOM(html);: We create a new JSDOM instance with the input HTML. This parses the HTML and creates a DOM tree that we can manipulate.
  5. const formattedHtml = dom.serialize();: We use the serialize() method to generate a formatted HTML string from the DOM tree.
  6. console.log(formattedHtml);: We log the formatted HTML string to the console.

Handling Edge Cases

Here are some common edge cases to consider when formatting HTML in Node.js:

Empty/null input

If the input HTML is empty or null, we should handle this case to avoid errors:

if (!html) {
  throw new Error('Input HTML is empty or null');
}

Invalid input

If the input HTML is invalid or malformed, we can use the jsdom library's built-in error handling to catch and handle errors:

try {
  const dom = new JSDOM(html);
  // ...
} catch (err) {
  console.error('Error parsing HTML:', err);
}

Large input

If the input HTML is very large, we may need to consider performance optimizations to avoid memory issues. One approach is to use a streaming parser like jsdom-stream:

const jsdomStream = require('jsdom-stream');

const htmlStream = fs.createReadStream('large-html-file.html');
const domStream = jsdomStream(htmlStream);

domStream.on('data', (chunk) => {
  // Process the parsed HTML chunk
});

Unicode/special characters

If the input HTML contains Unicode or special characters, we should ensure that our formatting code handles these characters correctly. The jsdom library automatically handles Unicode characters, but we may need to use a library like entities to handle special characters:

const entities = require('entities');

const html = '<p>Hello &#x1F600;</p>';
const decodedHtml = entities.decodeHTML(html);
const dom = new JSDOM(decodedHtml);

Common Mistakes

Here are three common mistakes developers make when formatting HTML in Node.js:

Mistake 1: Forgetting to handle errors

// Wrong code
const dom = new JSDOM(html);
const formattedHtml = dom.serialize();

// Corrected code
try {
  const dom = new JSDOM(html);
  const formattedHtml = dom.serialize();
} catch (err) {
  console.error('Error parsing HTML:', err);
}

Mistake 2: Not handling empty/null input

// Wrong code
const dom = new JSDOM(html);
const formattedHtml = dom.serialize();

// Corrected code
if (!html) {
  throw new Error('Input HTML is empty or null');
}
const dom = new JSDOM(html);
const formattedHtml = dom.serialize();

Mistake 3: Not using the correct encoding

// Wrong code
const html = '<p>Hello &#x1F600;</p>';
const dom = new JSDOM(html);
const formattedHtml = dom.serialize();

// Corrected code
const entities = require('entities');
const html = '<p>Hello &#x1F600;</p>';
const decodedHtml = entities.decodeHTML(html);
const dom = new JSDOM(decodedHtml);
const formattedHtml = dom.serialize();

Performance Tips

Here are two practical performance tips for formatting HTML in Node.js:

Tip 1: Use a streaming parser

If you're working with large HTML files, consider using a streaming parser like jsdom-stream to avoid memory issues:

const jsdomStream = require('jsdom-stream');

const htmlStream = fs.createReadStream('large-html-file.html');
const domStream = jsdomStream(htmlStream);

domStream.on('data', (chunk) => {
  // Process the parsed HTML chunk
});

Tip 2: Use a caching layer

If you're formatting HTML repeatedly with the same input, consider using a caching layer like lru-cache to store the formatted HTML:

const LRU = require('lru-cache');

const cache = new LRU({ max: 100 });

function formatHtml(html) {
  if (cache.has(html)) {
    return cache.get(html);
  }
  const dom = new JSDOM(html);
  const formattedHtml = dom.serialize();
  cache.set(html, formattedHtml);
  return formattedHtml;
}

FAQ

Q: What is the difference between jsdom and cheerio?

A: jsdom is a full-fledged DOM implementation that allows you to parse and manipulate HTML in Node.js, while cheerio is a lightweight HTML parser that provides a jQuery-like API.

Q: How do I handle Unicode characters in my input HTML?

A: The jsdom library automatically handles Unicode characters, but you may need to use a library like entities to handle special characters.

Q: Can I use jsdom with other Node.js libraries like Express or Koa?

A: Yes, you can use jsdom with other Node.js libraries like Express or Koa to generate and manipulate HTML content programmatically.

Q: How do I optimize the performance of my HTML formatting code?

A: Consider using a streaming parser like jsdom-stream and a caching layer like lru-cache to optimize the performance of your HTML formatting code.

Q: What is the difference between serialize() and outerHTML?

A: serialize() generates a formatted HTML string, while outerHTML generates the HTML string for an individual element.

AI agent tools available. The CodeTidy MCP Server gives Claude, Cursor, and other AI agents access to 60+ developer tools. One command: npx @codetidy/mcp