作者: GraphiteCube 時間: 2009-6-25 16:38 標題: [JEE] JSP Encoding (Solved)
I am writing a web application to fetch HTML codes from another web page, extract useful data and display it on JSP. As the source web page contains characters in Big5 encoding, I put the following tag in JSP:
- <meta http-equiv="Content-Type" content="text/html; charset=Big5" />
- <%@page contentType="text/html;charset=big5"%>

I also tried to put the JSP page directive tag only, but it doesn't work (just like the above image).
Is it due to the encoding of Java compiler? Or other reasons?
Please help and thank you.
[ 本帖最後由 GraphiteCube 於 2009-6-25 22:56 編輯 ]
作者: xeon0541 時間: 2009-6-25 17:36
first make sure the encoding format of data is correct
second try this
<%@ page contentType="text/html; charset=big5"%>
or using utf-8 encoding
[ 本帖最後由 xeon0541 於 2009-6-25 17:40 編輯 ]
作者: GraphiteCube 時間: 2009-6-25 18:06 標題: 回覆 2# 的帖子
I added it into JSP but it doesn't help.
Character encoding is annoying...
作者: xeon0541 時間: 2009-6-25 20:11
seem encoding format of data is incorrect , you need to converting the encoding format of data ,for example:
String rs="你我";
String col_1_b=rs getBytes("8859_1");
String a=new String(col_1_b,"big5");
[ 本帖最後由 xeon0541 於 2009-6-25 20:12 編輯 ]
作者: patrickit 時間: 2009-6-25 21:11
原帖由 xeon0541 於 2009-6-25 20:11 發表
seem encoding format of data is incorrect , you need to converting the encoding format of data ,for example:
String rs="你我";
String col_1_b=rs getBytes("8859_1");
String a=new String(col_1_b,"big ...
唔使咁做架....
作者: xeon0541 時間: 2009-6-25 21:45
just simple example
作者: Tin852 時間: 2009-6-25 21:48
You said the source page is using Big5, as all string and character in Java are in unicode, so u need to either
- convert the source page content to unicode 1st by assigning correct encoding when you are reading text and store as String
or
- use byte[] all the way.
Otherwise the character encoding will be a mess.
If you are using String, u can use either unicode or big5 for pageEncoding attribute in <%page %> tag
or if you are using byte[], you may need to print it out directly using response.
作者: GraphiteCube 時間: 2009-6-25 22:02 標題: 回覆 5# 的帖子
Could you suggest any solutions?
作者: GraphiteCube 時間: 2009-6-25 22:05 標題: 回覆 7# 的帖子
If I want to convert the fetched page content from Big5 encoding to UTF-8 encoding, could getBytes() method in class String help?
Thanks.
作者: Tin852 時間: 2009-6-25 22:16 標題: 回覆 9# 的帖子
You don't need to use getBytes(), just set the correct encoding when fetching from source, e.g. InputStreamReader(InputStream in, String charsetName)
will do.
Then you will get the content in Unicode(Java internal encoding), then you just need to specify the output encoding using pageEncoding attribute in <%page%>, JspWriter will do all the convertion for you.
作者: GraphiteCube 時間: 2009-6-25 22:55
Thanks everyone, I have solved the problem. The following are part of my JSP codes, hopefully useful for others.
In a tag library handler class (I don't like embed business logics in JSP, so I chose tag library):
- String rawCode;
- try {
- while ((rawCode = buffReader.readLine()) != null) {
- // Only print useful HTML codes
- if (rawCode.indexOf("someImportantText") >= 0) {
- // Convert fetched HTML code to Big5 characters
- StringBuffer html = new StringBuffer(new String(rawCode.getBytes(), "Big5"));
- out.println(html);
- out.println("
- ");
- }
- }
- } catch (java.io.IOException e) {
- }
- <%@page pageEncoding="Big5"%>
- <html xmlns="http://www.w3.org/1999/xhtml">
- <head>
- <meta http-equiv="Content-Type" content="text/html; charset=Big5" />
作者: henrywho 時間: 2009-6-26 23:20
Are "rawCode" containing the right bytes itself, say those chinese characters in UTF-16 format? Try printing the hex code of its bytes.
Also, does this solution work for those HK supplementary characters?
作者: GraphiteCube 時間: 2009-6-27 00:41 標題: 回覆 12# 的帖子
rawCodes contain characters encoded in Big5.
I haven't try it for characters in HKSCS.
作者: jzu 時間: 2009-6-27 02:05
it's better to convert the raw contents to utf-8
so that it can be shown by your webpage correctly
[ 本帖最後由 jzu 於 2009-6-27 02:06 編輯 ]
作者: GraphiteCube 時間: 2009-6-27 02:54
Argh! It seems I know what happened.
Before I print the HTML, I should decode the fetched content with Big5 character set first.
- // Extract useful data from HTML codes
- String html = this.extractData(new StringBuffer(rawCode));
- // Decode fetched HTML with Big5 character set
- out.println(new String(html.getBytes(), "Big5"));
So now I can put the following in JSP:
- <%@page pageEncoding="UTF-8"%>
- ...
- <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
P.S. 希望呢個方法可以連我手上另一個有關XML file的Encoding問題都解決到。
[ 本帖最後由 GraphiteCube 於 2009-6-27 02:56 編輯 ]

