Try it yourself with our free Html Beautifier tool — runs entirely in your browser, no signup needed.

How to Format HTML in C#

How to format HTML in C#

Formatting HTML in C# is an essential task for any web development project. It allows you to parse, manipulate, and generate HTML documents programmatically. In this article, we will explore how to format HTML in C# using the System.Net.Http namespace and the HtmlAgilityPack library.

Quick Example

using System.Net.Http;
using HtmlAgilityPack;

class HtmlFormatter
{
    public static string FormatHtml(string html)
    {
        var doc = new HtmlDocument();
        doc.LoadHtml(html);
        return doc.DocumentNode.OuterHtml;
    }
}

// Example usage:
string html = "<html><body><h1>Hello World!</h1></body></html>";
string formattedHtml = HtmlFormatter.FormatHtml(html);
Console.WriteLine(formattedHtml);

This code uses the HtmlAgilityPack library to parse the input HTML and return the formatted HTML as a string.

Step-by-Step Breakdown

Let's walk through the code line by line:

  • using System.Net.Http; imports the System.Net.Http namespace, which is required for the HtmlAgilityPack library.
  • using HtmlAgilityPack; imports the HtmlAgilityPack namespace.
  • class HtmlFormatter defines a new class called HtmlFormatter.
  • public static string FormatHtml(string html) defines a new method called FormatHtml that takes a string input html and returns a formatted HTML string.
  • var doc = new HtmlDocument(); creates a new instance of the HtmlDocument class.
  • doc.LoadHtml(html); loads the input HTML into the HtmlDocument instance.
  • return doc.DocumentNode.OuterHtml; returns the formatted HTML as a string.

Handling Edge Cases

Here are some common edge cases and how to handle them:

Empty/Null Input

public static string FormatHtml(string html)
{
    if (string.IsNullOrEmpty(html))
    {
        return string.Empty;
    }
    // ...
}

In this case, we simply return an empty string if the input is null or empty.

Invalid Input

public static string FormatHtml(string html)
{
    try
    {
        var doc = new HtmlDocument();
        doc.LoadHtml(html);
        return doc.DocumentNode.OuterHtml;
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Error parsing HTML: {ex.Message}");
        return string.Empty;
    }
}

In this case, we catch any exceptions that occur during parsing and return an empty string.

Large Input

public static string FormatHtml(string html)
{
    var doc = new HtmlDocument();
    doc.LoadHtml(html, true); // enable streaming
    return doc.DocumentNode.OuterHtml;
}

In this case, we enable streaming to handle large input HTML.

Unicode/Special Characters

public static string FormatHtml(string html)
{
    var doc = new HtmlDocument();
    doc.LoadHtml(html, Encoding.UTF8);
    return doc.DocumentNode.OuterHtml;
}

In this case, we specify the encoding to handle Unicode and special characters.

Common Mistakes

Here are some common mistakes developers make:

Mistake 1: Not handling null input

// Wrong code:
public static string FormatHtml(string html)
{
    var doc = new HtmlDocument();
    doc.LoadHtml(html);
    return doc.DocumentNode.OuterHtml;
}

// Corrected code:
public static string FormatHtml(string html)
{
    if (string.IsNullOrEmpty(html))
    {
        return string.Empty;
    }
    var doc = new HtmlDocument();
    doc.LoadHtml(html);
    return doc.DocumentNode.OuterHtml;
}

Mistake 2: Not handling invalid input

// Wrong code:
public static string FormatHtml(string html)
{
    var doc = new HtmlDocument();
    doc.LoadHtml(html);
    return doc.DocumentNode.OuterHtml;
}

// Corrected code:
public static string FormatHtml(string html)
{
    try
    {
        var doc = new HtmlDocument();
        doc.LoadHtml(html);
        return doc.DocumentNode.OuterHtml;
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Error parsing HTML: {ex.Message}");
        return string.Empty;
    }
}

Mistake 3: Not enabling streaming for large input

// Wrong code:
public static string FormatHtml(string html)
{
    var doc = new HtmlDocument();
    doc.LoadHtml(html);
    return doc.DocumentNode.OuterHtml;
}

// Corrected code:
public static string FormatHtml(string html)
{
    var doc = new HtmlDocument();
    doc.LoadHtml(html, true); // enable streaming
    return doc.DocumentNode.OuterHtml;
}

Performance Tips

Here are some performance tips:

Tip 1: Use streaming for large input

public static string FormatHtml(string html)
{
    var doc = new HtmlDocument();
    doc.LoadHtml(html, true); // enable streaming
    return doc.DocumentNode.OuterHtml;
}

Tip 2: Use a caching mechanism

public static string FormatHtml(string html)
{
    if (Cache.ContainsKey(html))
    {
        return Cache[html];
    }
    var doc = new HtmlDocument();
    doc.LoadHtml(html);
    Cache.Add(html, doc.DocumentNode.OuterHtml);
    return doc.DocumentNode.OuterHtml;
}

Tip 3: Use a parallel processing mechanism

public static string FormatHtml(string html)
{
    Parallel.Invoke(() =>
    {
        var doc = new HtmlDocument();
        doc.LoadHtml(html);
        return doc.DocumentNode.OuterHtml;
    });
}

FAQ

Q: What is the recommended way to handle null input?

A: The recommended way to handle null input is to return an empty string.

Q: How do I handle invalid input?

A: You can handle invalid input by catching exceptions and returning an empty string.

Q: What is the recommended way to handle large input?

A: The recommended way to handle large input is to enable streaming.

Q: How do I handle Unicode and special characters?

A: You can handle Unicode and special characters by specifying the encoding.

Q: What is the recommended way to improve performance?

A: The recommended way to improve performance is to use streaming, caching, and parallel processing mechanisms.

AI agent tools available. The CodeTidy MCP Server gives Claude, Cursor, and other AI agents access to 60+ developer tools. One command: npx @codetidy/mcp