Discussion:
Store in a file a web page written in chinese
(too old to reply)
Antonio
2004-10-25 08:07:38 UTC
Permalink
Hi,
I want to read an html page written in chinese and store it in a file
having extension .aspx , I'm not sure where is the problem, I use the
following lines of code:

String sAddress = "http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http://www.etantonio.it/EN/index.aspx"
;

WebRequest req = WebRequest.Create(sAddress);
WebResponse result = req.GetResponse();
Stream ReceiveStream = result.GetResponseStream();
StreamReader reader = new StreamReader(ReceiveStream, Encoding.UTF8 );
String sHtmlTradotto = reader.ReadToEnd();

StreamWriter writer = new StreamWriter( "prova.aspx" , false,
System.Text.Encoding.UTF8) ;

writer.Write(sHtmlTradotto);
writer.Flush();
writer.Close();

But the file produced didn't contain the chinese characters so, how
can I solve the problem???

Many Thanks in advance ...

Ing. Antonio D'Ottavio
Jon Skeet [C# MVP]
2004-10-25 08:15:04 UTC
Permalink
Post by Antonio
I want to read an html page written in chinese and store it in a file
having extension .aspx , I'm not sure where is the problem, I use the
String sAddress =
"http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh
&trurl=http://www.etantonio.it/EN/index.aspx"
;
WebRequest req = WebRequest.Create(sAddress);
WebResponse result = req.GetResponse();
Stream ReceiveStream = result.GetResponseStream();
StreamReader reader = new StreamReader(ReceiveStream, Encoding.UTF8 );
String sHtmlTradotto = reader.ReadToEnd();
StreamWriter writer = new StreamWriter( "prova.aspx" , false,
System.Text.Encoding.UTF8) ;
writer.Write(sHtmlTradotto);
writer.Flush();
writer.Close();
But the file produced didn't contain the chinese characters so, how
can I solve the problem???
Are you sure that it's returning the data in UTF-8? How are you
checking whether or not the file contained Chinese characters?

I'd look in more depth myself, but using the code above, it's
complaining that the server committed an HTTP protocol violation :(
--
Jon Skeet - <***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Antonio
2004-10-27 06:42:48 UTC
Permalink
Hi,
I simply try to connect to the url

http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http://www.etantonio.it/EN/index.aspx

with internet explorer and this is the result where I can see that the
charset=UTF-8 and I can normally see chinese symbols :


<html><meta http-equiv="content-type" content="text/html;
charset=UTF-8"><base href="http://www.etantonio.it/EN/index.aspx">
<!-- removed --><meta http-equiv="Content-Type" content="text/html ;
CHARSET=UTF-8"><base href="http://www.etantonio.it/EN/index.aspx">
<!doctype HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<head>
<title>Etantonio</title>
<meta name="author" content="Antonio DOttavio">
<meta name="description" content="Etantonio Index">
<link href="Stili.css" rel="stylesheet" type="text/css">
</head>
<body>

<script language=JavaScript src="menu_array.js"
type=text/javascript></script>
<script language=JavaScript src="mmenu.js"
type=text/javascript></script>

<table width="750" height="430" border="0" cellpadding="0"
cellspacing="0" background="/images/EsserSpettatoriNonEstSerioElefante.jpg">
<tr>
<td valign="top">

<table width="90%" border="0" align="center" cellspacing="12">
<tr height="70" valign="top">
<td>&nbsp;</td>
<td width="25%" rowspan="2">
<p align="center"><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fUniversita%2findex.aspx"
class="testoMedioVerde">&#22823;&#23398;</a></p>
<p align="center"
class="testoPiccolissimoVerde">&#23398;&#22763;&#36335;&#32447;&#30340;&#31508;&#35760;&#22312;&#24037;&#31243;&#23398;&#30005;&#23376;,
&#35770;&#25991;&#12289;&#30740;&#31350;&#26041;&#27861;&#21644;&#36866;&#24403;&#23562;&#25964;&#23545;&#36215;&#28304;&#26449;&#24196;&#12290;
</p>
</td>
<td width="25%" rowspan="2">
<p align="center"><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fEconomia%2findex.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fEconomia%2findex.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fEconomia%2findex.aspx"
class="testoMedioVerde">&#32463;&#27982;</a> </p>
<p align="center"
class="testoPiccolissimoVerde">&#22996;&#21592;&#20250;&#12289;&#20026;&#36130;&#25919;&#31038;&#21306;&#30340;&#25216;&#26415;&#21644;&#20202;&#22120;,
&#35814;&#23613;&#38416;&#36848;&#23545;&#24744;&#22312;,
&#22312;&#20379;&#36873;&#25321;&#21464;&#36801;&#20043;&#38388;,
&#25345;&#32493;&#20174;1994
&#24180;&#20010;&#20154;&#32463;&#39564;&#30340;&#22522;&#22320;&#12290;</p></td>
<td width="25%">&nbsp;</td>
</tr>
<tr height="140" valign="top">
<td width="25%">
<p align="center"><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fLavoro%2findex.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fLavoro%2findex.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fLavoro%2findex.aspx"
class="testoMedioVerde">&#24037;&#20316;</a> </p>
<p align="center"
class="testoPiccolissimoVerde">&#31616;&#21382;,
&#22270;&#35937;&#35777;&#23454;&#23545;&#24744;,
&#21644;&#19968;&#20123;&#20202;&#22120;&#21644;&#21442;&#32771;&#20026;&#24037;&#20316;&#26426;&#20250;&#26597;&#23547;&#12290;
</p>
</td>
<td width="25%">
<p align="center" ><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fWeb%2fGifAnimate%2findex.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fWeb%2fGifAnimate%2findex.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fWeb%2fGifAnimate%2findex.aspx"
class="testoMedioVerde">&#32593;</a> </p>
<p align="center"
class="testoPiccolissimoVerde">&#25628;&#32034;&#24341;&#25806;&#22312;&#26080;&#25968;GIF
&#36171;&#20104;&#29983;&#21629;&#20174;&#25105;&#36873;&#25321;&#20102;&#21644;&#35814;&#23613;&#38416;&#36848;&#20102;,
&#38543;&#21518;&#23558;&#26469;&#32593;&#30340;&#34987;&#25554;&#20837;&#30340;&#23454;&#39564;&#12290;
</p>
</td>
</tr>
<tr valign="top">
<td width="25%">
<p align="center"><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fVarie%2findex.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fVarie%2findex.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fVarie%2findex.aspx"
class="testoMedioVerde">&#25968;</a> </p>
<p align="center"
class="testoPiccolissimoVerde">&#24040;&#22823;&#25105;&#30340;&#21033;&#30410;&#21457;&#29616;&#36825;&#37324;&#20986;&#27668;&#23380;,
&#33402;&#26415;, &#26053;&#34892;,
&#28608;&#24773;&#20197;&#36828;&#23545;&#25105;&#30340;&#28909;&#28857;&#34920;&#30340;&#38142;&#25509;&#12290;
</p>
</td>
<td width="25%"> <div align="center"></div></td>
<td width="25%"> <div align="center"></div></td>
<td width="25%">
<p align="center"><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fContatti%2findex.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fContatti%2findex.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fContatti%2findex.aspx"
class="testoMedioVerde">&#32852;&#32476;</a></p>
<p align="center"
class="testoPiccolissimoVerde">&#36825;&#37324;&#23427;&#26159;&#21487;&#33021;&#25509;&#35302;&#23545;&#25105;&#20026;&#27599;&#24517;&#35201;&#25110;&#29702;&#20107;&#20250;&#26159;&#36890;&#36807;&#32534;&#20889;&#24418;&#24335;&#25110;&#25554;&#20837;&#28040;&#24687;nel
&#35770;&#22363;delle &#24819;&#27861;&#30340;&#37038;&#20214;&#12290;
</p>
</td>
</tr>
</table>

</td>
</tr>
</table>
<script>InserisciFooter();</script>
<br>
<a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fEN%2fUniversita%2findex.aspx"
class="trasparente"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fEN%2fUniversita%2findex.aspx"
class="trasparente"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fEN%2fUniversita%2findex.aspx"
class="trasparente">Universita &#29992; </a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fFR%2fUniversita%2findex.aspx"
class="trasparente">Universita</a>
<a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fFR%2fUniversita%2findex.aspx"
class="trasparente"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fEN%2fUniversita%2findex.aspx"
class="trasparente">&#33521;&#35821;
</a><a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww.etantonio.it%2fEN%2fFR%2fUniversita%2findex.aspx"
class="trasparente">&#29992;&#27861;&#35821;</a>
</td>
</a>
<td>


</body>
</html>



I'm trying to read and store it in a file
having extension .aspx , the result is that many characters are not
right evaluated, I use the following lines of code:

String sAddress = "http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http://www.etantonio.it/EN/index.aspx";

WebRequest req = WebRequest.Create(sAddress);
WebResponse result = req.GetResponse();
Stream ReceiveStream = result.GetResponseStream();
StreamReader reader = new StreamReader(ReceiveStream, Encoding.UTF8 );
String sHtmlTradotto = reader.ReadToEnd();

StreamWriter writer = new StreamWriter( "prova.aspx" , false,
System.Text.Encoding.UTF8) ;

writer.Write(sHtmlTradotto);
writer.Flush();
writer.Close();

Can you help me to solve the problem???

Many Thanks in advance ...

Ing. Antonio D'Ottavio
Nitin
2004-11-16 04:51:01 UTC
Permalink
hi jon
problem u r getting can be resolved
by updating one entry in machine.config file for unsafe headers

<httpWebRequest useUnsafeHeaderParsing="true" />
make this entry in under <Systems.net><Settings>
section of machine.config file
Post by Jon Skeet [C# MVP]
Post by Antonio
I want to read an html page written in chinese and store it in a file
having extension .aspx , I'm not sure where is the problem, I use the
String sAddress =
"http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh
&trurl=http://www.etantonio.it/EN/index.aspx"
;
WebRequest req = WebRequest.Create(sAddress);
WebResponse result = req.GetResponse();
Stream ReceiveStream = result.GetResponseStream();
StreamReader reader = new StreamReader(ReceiveStream, Encoding.UTF8 );
String sHtmlTradotto = reader.ReadToEnd();
StreamWriter writer = new StreamWriter( "prova.aspx" , false,
System.Text.Encoding.UTF8) ;
writer.Write(sHtmlTradotto);
writer.Flush();
writer.Close();
But the file produced didn't contain the chinese characters so, how
can I solve the problem???
Are you sure that it's returning the data in UTF-8? How are you
checking whether or not the file contained Chinese characters?
I'd look in more depth myself, but using the code above, it's
complaining that the server committed an HTTP protocol violation :(
--
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Loading...