Monday, April 11, 2011

Finding Node of Matching Raw Html in an HtmlAgility HtmlDocument

Hi,

I currently have a program that finds and edits HTML files based on finding a tag with a matching id.

I would like to extend it to find a tag that has matching InnerHtml (disregarding capitalization and whitespace)

What is a good way to use Html Agility to do this? I would like to do it using Html Agility because the rest of the program is using it.

Thanks.

From stackoverflow
  • Rough shooting it here but you should be able to do something like this:

                HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("YOUR_TAG_SELECTOR");
    
                if (nodes != null)
                {
                    foreach (HtmlNode node in nodes)
                    {
                        if (node.InnerHtml.ToLower().Trim() == "YOUR_MATCH")
                        {
                            //success routine
                            break;
                        }
                    }
                }
    
    Alex Baranosky : I think this should be node.InnerHtml, not node.InnerText :)
    Pat : Ahh yes my apologies I read matching text in the original question. Corrected.
  • We've done this using Regular Expressions. Something like this works for us:

    private static List<HtmlNode> GetMatchingNodes(string xPath, string pattern, HtmlDocument htmlDocument)
    {
        List<HtmlNode> matchingNodes = new List<HtmlNode>();
        foreach (HtmlNode node in htmlDocument.DocumentNode.SelectNodes(xPath))
        {
         if (Regex.IsMatch(node.InnerHtml, pattern))
         {
          matchingNodes.Add(node);
         }
        }
        return matchingNodes;
    }
    

    Hope this helps. :)

0 comments:

Post a Comment

Note: Only a member of this blog may post a comment.