Techiio-author
Started by Bell GrantNov 7, 2021

Open
How to use HTML Agility pack

2 VIEWES 0 LIKES 0 DISLIKES SHARE
0 LIKES 0 DISLIKES 2 VIEWES SHARE

My XHTML document is not legitimate. It is why I desired to use it. How do I exploit it in my task? My undertaking is in c#.

1 Replies

Techiio-commentatorHarry Lian replied 2 months ago0 likes0 dislikes

First, install the HTMLAgilityPack nuget package into your project.

Then, as an example:

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
// There are various options, set as needed
htmlDoc.OptionFixNestedTags=true;
// filePath is a path to a file containing the html
htmlDoc.Load(filePath);
// Use:  htmlDoc.LoadHtml(xmlString);  to load from a string (was htmlDoc.LoadXML(xmlString)
// ParseErrors is an ArrayList containing any errors from the Load statement
if (htmlDoc.ParseErrors != null && htmlDoc.ParseErrors.Count() > 0)
{
    // Handle any parse errors as required
}
else
{
    if (htmlDoc.DocumentNode != null)
    {
        HtmlAgilityPack.HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode("//body");
        if (bodyNode != null)
        {
            // Do something with bodyNode
        }
    }
}

(NB: This code is an example only and not necessarily the best/only approach. Do not use it blindly in your own application.)

The HtmlDocument.Load() method also accepts a stream which is very useful in integrating with other stream oriented classes in the .NET framework. While HtmlEntity.DeEntitize() is another useful method for processing html entities correctly. (thanks Matthew)

HtmlDocument and HtmlNode are the classes you'll use most. Similar to an XML parser, it provides the selectSingleNode and selectNodes methods that accept XPath expressions.

Pay attention to the HtmlDocument.Option?????? boolean properties. These control how the Load and LoadXML methods will process your HTML/XHTML.

There is also a compiled help file called HtmlAgilityPack.chm that has a complete reference for each of the objects. This is normally in the base folder of the solution.

You must be Logged in to reply
Trending Technologies
15
Software40
DevOps46
Frontend Development24
Backend Development20
Server Administration17
Linux Administration26
Data Center24
Sentry24
Terraform23
Ansible83
Docker70
Penetration Testing16
Kubernetes21
NGINX20
JenkinsX17
Techiio-logo

Techiio is on the journey to build an ocean of technical knowledge, scouring the emerging stars in process and proffering them to the corporate world.

Follow us on:

Subscribe to get latest updates

You can unsubscribe anytime from getting updates from us
Developed and maintained by Wikiance
Developed and maintained by Wikiance