Saturday, April 24, 2010

Screen scraping in C# using HtmlAgilityPack.

In my project, i have used  HtmlAgilityPack library to capture the page and parse it.

1. First you create a asp.net project and add HtmlAgilityPack reference in your project.

2. Create a 'HtmlWeb' object

3. Load the html page using 'HtmlDocument'

4. now you get html page in 'HtmlDocument' object.

5. You can display this 'HtmlDocument' object in asp 'Literal' control.

the code this below


HtmlWeb hwObject = new HtmlWeb();
            HtmlDocument htmldocObject = hwObject.Load("http://www.c-sharpcorner.com");
          
           
lLatestPrice.Text = htmldocObject.DocumentNode.InnerHtml;


You can find all link from this page using a loop like this

            foreach (HtmlNode link in htmldocObject.DocumentNode.SelectNodes("//a[@href]"))
            {
                string s = link.InnerText;
               
            }

thats it.....