In many cases you want to programmatically read/parse the content of an URL and retrieve the HTML source code, here's a simple function I wrote to do that :
// Get html page content with c# class
public static string Get_HTML(string Url)
{
System.Net.WebResponse Result = null;
string Page_Source_Code;
try
{
System.Net.WebRequest req = System.Net.WebRequest.Create(Url);
Result = req.GetResponse();
System.IO.Stream RStream = Result.GetResponseStream();
System.IO.StreamReader sr = new System.IO.StreamReader(RStream);
new System.IO.StreamReader(RStream);
Page_Source_Code = sr.ReadToEnd();
sr.Dispose();
}
catch
{
// error while reading the url: the url dosen’t exist, connection problem...
Page_Source_Code = "";
}
finally
{
if (Result != null) Result.Close();
}
return Page_Source_Code;
}
The above function will parse the server code and will attempt to detect the encoding automatically, it can recognize and parse correctly UTF-8, little-endian Unicode, and big-endian Unicode text.
If the web server uses another coding you can specify that, just replace these two lines :
System.IO.StreamReader sr = new System.IO.StreamReader(ReceiveStream, Encoding.GetEncoding("Windows-1252"));
new System.IO.StreamReader(ReceiveStream, Encoding.GetEncoding("Windows-1252"));
If you don't know what’s the coding used, you'll have to read it from the meta tag, for example:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
And if it's not specified you can just go with the "UTF-8".