Saturday, April 24, 2010

Screen scraping in C# using HtmlAgilityPack.

In my project, i have used  HtmlAgilityPack library to capture the page and parse it.

1. First you create a asp.net project and add HtmlAgilityPack reference in your project.

2. Create a 'HtmlWeb' object

3. Load the html page using 'HtmlDocument'

4. now you get html page in 'HtmlDocument' object.

5. You can display this 'HtmlDocument' object in asp 'Literal' control.

the code this below


HtmlWeb hwObject = new HtmlWeb();
            HtmlDocument htmldocObject = hwObject.Load("http://www.c-sharpcorner.com");
          
           
lLatestPrice.Text = htmldocObject.DocumentNode.InnerHtml;


You can find all link from this page using a loop like this

            foreach (HtmlNode link in htmldocObject.DocumentNode.SelectNodes("//a[@href]"))
            {
                string s = link.InnerText;
               
            }

thats it.....

2 comments:

Xulfee said...

can it regenerate html if some tags are not close or any mistakes done.

Anonymous said...

thats great and easy thank you so much all i have to do now is just study xpath