Try it yourself with our free Html Beautifier tool — runs entirely in your browser, no signup needed.

How to Format HTML in Ruby

How to Format HTML in Ruby

Formatting HTML in Ruby is an essential task for any web development project. It allows you to generate, manipulate, and transform HTML documents programmatically. In this article, we will explore how to format HTML in Ruby using the nokogiri gem, a popular and powerful HTML parsing library.

Quick Example

Here is a minimal example of how to format HTML in Ruby using nokogiri:

require 'nokogiri'

html = '<html><body>Hello <span>World!</span></body></html>'
doc = Nokogiri::HTML(html)
formatted_html = doc.to_html(indent: 2, encoding: 'UTF-8')

puts formatted_html

This code takes an HTML string, parses it using Nokogiri::HTML, and then generates a formatted HTML string using the to_html method.

Step-by-Step Breakdown

Let's walk through the code line by line:

  1. require 'nokogiri': This line imports the nokogiri gem, which provides the Nokogiri::HTML class used to parse HTML.
  2. html = '<html><body>Hello <span>World!</span></body></html>': This line defines a sample HTML string that we will use for demonstration purposes.
  3. doc = Nokogiri::HTML(html): This line creates a new Nokogiri::HTML object by parsing the HTML string. The Nokogiri::HTML class provides a variety of methods for manipulating and querying the HTML document.
  4. formatted_html = doc.to_html(indent: 2, encoding: 'UTF-8'): This line generates a formatted HTML string using the to_html method. The indent option specifies the indentation level, and the encoding option specifies the character encoding of the output.
  5. puts formatted_html: This line prints the formatted HTML string to the console.

Handling Edge Cases

Here are some common edge cases to consider when formatting HTML in Ruby:

Empty/Null Input

What happens when the input HTML string is empty or null? In this case, nokogiri will raise an error. To handle this case, you can add a simple check:

html = ''
if html.blank?
  puts 'Input HTML is empty or null'
else
  doc = Nokogiri::HTML(html)
  formatted_html = doc.to_html(indent: 2, encoding: 'UTF-8')
  puts formatted_html
end

Invalid Input

What happens when the input HTML string is invalid? In this case, nokogiri will raise an error. To handle this case, you can use the Nokogiri::HTML.fragment method, which parses the HTML string as a fragment instead of a full document:

html = '< invalid >'
begin
  doc = Nokogiri::HTML(html)
rescue Nokogiri::SyntaxError
  doc = Nokogiri::HTML.fragment(html)
end
formatted_html = doc.to_html(indent: 2, encoding: 'UTF-8')
puts formatted_html

Large Input

What happens when the input HTML string is very large? In this case, nokogiri may consume a lot of memory. To handle this case, you can use the Nokogiri::HTML::Builder class, which allows you to build the HTML document incrementally:

html = '<html><body>'
1000.times do
  html << '<p>Hello World!</p>'
end
html << '</body></html>'

builder = Nokogiri::HTML::Builder.new
builder.html {
  builder.body {
    html.split('<p>').each do |fragment|
      builder.p(fragment)
    end
  }
}
formatted_html = builder.to_html(indent: 2, encoding: 'UTF-8')
puts formatted_html

Unicode/Special Characters

What happens when the input HTML string contains Unicode or special characters? In this case, nokogiri will handle the characters correctly. However, you may need to specify the correct encoding when generating the formatted HTML string:

html = '<html><body>Hello <span>World!</span> '
doc = Nokogiri::HTML(html)
formatted_html = doc.to_html(indent: 2, encoding: 'UTF-8')
puts formatted_html

Common Mistakes

Here are some common mistakes to avoid when formatting HTML in Ruby:

Mistake 1: Forgetting to Specify Encoding

# Wrong
formatted_html = doc.to_html(indent: 2)

# Correct
formatted_html = doc.to_html(indent: 2, encoding: 'UTF-8')

Mistake 2: Using puts Instead of print

# Wrong
puts formatted_html

# Correct
print formatted_html

Mistake 3: Not Handling Edge Cases

# Wrong
doc = Nokogiri::HTML(html)
formatted_html = doc.to_html(indent: 2, encoding: 'UTF-8')

# Correct
if html.blank?
  puts 'Input HTML is empty or null'
else
  doc = Nokogiri::HTML(html)
  formatted_html = doc.to_html(indent: 2, encoding: 'UTF-8')
  puts formatted_html
end

Performance Tips

Here are some performance tips to keep in mind when formatting HTML in Ruby:

  1. Use nokogiri instead of REXML: nokogiri is a faster and more efficient HTML parsing library than REXML.
  2. Use Nokogiri::HTML::Builder for large inputs: Nokogiri::HTML::Builder allows you to build the HTML document incrementally, which can be more memory-efficient for large inputs.
  3. Specify the correct encoding: Specifying the correct encoding when generating the formatted HTML string can help avoid encoding errors.

FAQ

Q: How do I install nokogiri?

A: You can install nokogiri using the following command: gem install nokogiri

Q: How do I parse an HTML string using nokogiri?

A: You can parse an HTML string using the Nokogiri::HTML class: doc = Nokogiri::HTML(html)

Q: How do I generate a formatted HTML string using nokogiri?

A: You can generate a formatted HTML string using the to_html method: formatted_html = doc.to_html(indent: 2, encoding: 'UTF-8')

Q: How do I handle edge cases when formatting HTML in Ruby?

A: You can handle edge cases by adding checks for empty or null input, invalid input, large input, and Unicode or special characters.

Q: What are some performance tips for formatting HTML in Ruby?

A: Some performance tips include using nokogiri instead of REXML, using Nokogiri::HTML::Builder for large inputs, and specifying the correct encoding.

AI agent tools available. The CodeTidy MCP Server gives Claude, Cursor, and other AI agents access to 60+ developer tools. One command: npx @codetidy/mcp