Try it yourself with our free Html Entity Encoder tool — runs entirely in your browser, no signup needed.

How to HTML encode in Ruby

How to HTML encode in Ruby

HTML encoding is the process of converting special characters in a string into their corresponding HTML entities. This is crucial when displaying user-generated content on a web page to prevent XSS (Cross-Site Scripting) attacks and ensure that the content is displayed correctly. In this guide, we will explore how to HTML encode strings in Ruby.

Quick Example

Here is a minimal example of how to HTML encode a string in Ruby:

require 'cgi'

def html_encode(input)
  CGI.escapeHTML(input)
end

input = "Hello, <script>alert('XSS')</script> world!"
encoded = html_encode(input)
puts encoded # Output: Hello, &lt;script&gt;alert(&#x27;XSS&#x27;)&lt;/script&gt; world!

This code defines a method html_encode that takes an input string and uses the CGI.escapeHTML method to encode it.

Step-by-Step Breakdown

Let's walk through the code line by line:

  • require 'cgi': This line imports the cgi library, which provides the escapeHTML method for HTML encoding.
  • def html_encode(input): This line defines a new method called html_encode that takes a single argument input.
  • CGI.escapeHTML(input): This line calls the escapeHTML method on the CGI module, passing the input string as an argument. This method replaces special characters in the input string with their corresponding HTML entities.
  • puts encoded: This line prints the encoded string to the console.

Handling Edge Cases

Here are some common edge cases to consider when HTML encoding in Ruby:

Empty/Null Input

If the input is empty or null, the escapeHTML method will return an empty string. To handle this case, you can add a simple check:

def html_encode(input)
  input.nil? || input.empty? ? '' : CGI.escapeHTML(input)
end

Invalid Input

If the input is not a string, the escapeHTML method will raise a TypeError. To handle this case, you can add a type check:

def html_encode(input)
  raise TypeError, 'Input must be a string' unless input.is_a?(String)
  CGI.escapeHTML(input)
end

Large Input

If the input is very large, the escapeHTML method may take a long time to complete. To handle this case, you can use a streaming approach:

def html_encode(input)
  encoder = CGI::HtmlEscape.new
  input.each_line do |line|
    encoder.escape(line)
  end
end

Unicode/Special Characters

The escapeHTML method correctly handles Unicode characters and special characters. However, if you need to preserve the original encoding of the input string, you can use the encode method:

def html_encode(input)
  CGI.escapeHTML(input.encode('UTF-8'))
end

Common Mistakes

Here are three common mistakes developers make when HTML encoding in Ruby:

Mistake 1: Using gsub instead of escapeHTML

Wrong code:

def html_encode(input)
  input.gsub('<', '&lt;').gsub('>', '&gt;')
end

Corrected code:

def html_encode(input)
  CGI.escapeHTML(input)
end

Mistake 2: Not handling null input

Wrong code:

def html_encode(input)
  CGI.escapeHTML(input)
end

Corrected code:

def html_encode(input)
  input.nil? || input.empty? ? '' : CGI.escapeHTML(input)
end

Mistake 3: Not checking input type

Wrong code:

def html_encode(input)
  CGI.escapeHTML(input)
end

Corrected code:

def html_encode(input)
  raise TypeError, 'Input must be a string' unless input.is_a?(String)
  CGI.escapeHTML(input)
end

Performance Tips

Here are three performance tips for HTML encoding in Ruby:

Tip 1: Use CGI.escapeHTML instead of ERb::Util.html_escape

CGI.escapeHTML is faster and more efficient than ERb::Util.html_escape.

Tip 2: Avoid encoding large strings

If you need to encode large strings, use a streaming approach to avoid loading the entire string into memory.

Tip 3: Use String#encode to preserve encoding

If you need to preserve the original encoding of the input string, use the encode method to encode the string before passing it to CGI.escapeHTML.

FAQ

Q: What is the difference between CGI.escapeHTML and ERb::Util.html_escape?

A: CGI.escapeHTML is faster and more efficient than ERb::Util.html_escape.

Q: How do I handle null input?

A: Check if the input is null or empty and return an empty string if so.

Q: How do I handle large input?

A: Use a streaming approach to avoid loading the entire string into memory.

Q: How do I preserve the original encoding of the input string?

A: Use the encode method to encode the string before passing it to CGI.escapeHTML.

Q: What is the performance impact of HTML encoding?

A: The performance impact is typically negligible, but can be significant for very large strings.

AI agent tools available. The CodeTidy MCP Server gives Claude, Cursor, and other AI agents access to 60+ developer tools. One command: npx @codetidy/mcp