Try it yourself with our free Xml Formatter tool — runs entirely in your browser, no signup needed.

How to Parse XML in Ruby

How to Parse XML in Ruby

Parsing XML in Ruby is a crucial task for many developers, as it allows them to extract and manipulate data from XML files, web services, or other sources. XML (Extensible Markup Language) is a widely-used format for data exchange and storage, and Ruby provides several libraries to parse and process XML data. In this article, we will explore how to parse XML in Ruby using the popular nokogiri library.

Quick Example

require 'nokogiri'

xml_string = '<root><person><name>John Doe</name><age>30</age></person></root>'
doc = Nokogiri::XML(xml_string)
puts doc.css('person name').text # Output: John Doe

This code example demonstrates how to parse a simple XML string and extract the text content of a specific element.

Step-by-Step Breakdown

Here's a line-by-line explanation of the code:

require 'nokogiri'

We start by requiring the nokogiri library, which is a popular and efficient XML parsing library for Ruby.

xml_string = '<root><person><name>John Doe</name><age>30</age></person></root>'

We define a sample XML string that we want to parse. This string contains a simple XML document with a root element, a person element, and two child elements: name and age.

doc = Nokogiri::XML(xml_string)

We create a new Nokogiri::XML object by passing the XML string to the Nokogiri::XML constructor. This object represents the parsed XML document.

puts doc.css('person name').text

We use the css method to select the name element within the person element. The css method returns a Nokogiri::XML::NodeSet object, which is a collection of nodes that match the CSS selector. We then call the text method on the node set to extract the text content of the name element.

Handling Edge Cases

Empty/Null Input

When dealing with empty or null input, it's essential to handle the error to avoid crashes or unexpected behavior. Here's an example:

xml_string = ''
begin
  doc = Nokogiri::XML(xml_string)
rescue Nokogiri::XML::SyntaxError
  puts "Error: Empty or invalid input"
end

In this example, we wrap the XML parsing code in a begin-rescue block to catch the Nokogiri::XML::SyntaxError exception that is raised when the input is empty or invalid.

Invalid Input

When dealing with invalid input, it's crucial to handle the error to avoid crashes or unexpected behavior. Here's an example:

xml_string = '<root><person><name>John Doe</name><age>30</age></person>'
begin
  doc = Nokogiri::XML(xml_string)
rescue Nokogiri::XML::SyntaxError
  puts "Error: Invalid input"
end

In this example, we wrap the XML parsing code in a begin-rescue block to catch the Nokogiri::XML::SyntaxError exception that is raised when the input is invalid.

Large Input

When dealing with large input, it's essential to consider performance and memory usage. Here's an example:

xml_string = File.read('large_xml_file.xml')
doc = Nokogiri::XML::Reader(xml_string)
doc.each do |node|
  # Process the node
end

In this example, we use the Nokogiri::XML::Reader class to parse the large XML file in a streaming fashion, which reduces memory usage and improves performance.

Unicode/Special Characters

When dealing with Unicode or special characters, it's crucial to ensure that the XML parser handles them correctly. Here's an example:

xml_string = '<root><person><name>Jöhn Döe</name><age>30</age></person></root>'
doc = Nokogiri::XML(xml_string, nil, 'UTF-8')

In this example, we specify the encoding of the XML string as UTF-8 to ensure that the parser handles Unicode characters correctly.

Common Mistakes

Mistake 1: Not Handling Errors

# Wrong code
doc = Nokogiri::XML(xml_string)

# Corrected code
begin
  doc = Nokogiri::XML(xml_string)
rescue Nokogiri::XML::SyntaxError
  puts "Error: Invalid input"
end

Mistake 2: Not Checking for Empty Input

# Wrong code
doc = Nokogiri::XML(xml_string)

# Corrected code
if xml_string.blank?
  puts "Error: Empty input"
else
  doc = Nokogiri::XML(xml_string)
end

Mistake 3: Not Using the Correct Encoding

# Wrong code
doc = Nokogiri::XML(xml_string)

# Corrected code
doc = Nokogiri::XML(xml_string, nil, 'UTF-8')

Performance Tips

Tip 1: Use the Nokogiri::XML::Reader Class

When dealing with large input, use the Nokogiri::XML::Reader class to parse the XML file in a streaming fashion, which reduces memory usage and improves performance.

Tip 2: Use the css Method

When selecting nodes, use the css method instead of the xpath method, as it is faster and more efficient.

Tip 3: Avoid Using doc.to_xml

When parsing XML, avoid using the doc.to_xml method, as it creates a new XML string and can be slow for large documents. Instead, use the doc.css or doc.xpath methods to select nodes directly.

FAQ

Q: What is the difference between Nokogiri::XML and Nokogiri::HTML?

A: Nokogiri::XML is used for parsing XML documents, while Nokogiri::HTML is used for parsing HTML documents.

Q: How do I handle Unicode characters in XML?

A: Specify the encoding of the XML string as UTF-8 when creating the Nokogiri::XML object.

Q: What is the best way to parse large XML files?

A: Use the Nokogiri::XML::Reader class to parse the XML file in a streaming fashion.

Q: How do I select nodes in an XML document?

A: Use the css or xpath methods to select nodes in an XML document.

Q: What is the difference between doc.css and doc.xpath?

A: doc.css is faster and more efficient, while doc.xpath is more powerful and flexible.

AI agent tools available. The CodeTidy MCP Server gives Claude, Cursor, and other AI agents access to 60+ developer tools. One command: npx @codetidy/mcp