Board logo

標題: [JEE] JSP Encoding (Solved) [打印本頁]

作者: GraphiteCube    時間: 2009-6-25 16:38     標題: [JEE] JSP Encoding (Solved)

I am writing a web application to fetch HTML codes from another web page, extract useful data and display it on JSP. As the source web page contains characters in Big5 encoding, I put the following tag in JSP:
  1. <meta http-equiv="Content-Type" content="text/html; charset=Big5" />
複製代碼
However, the browser still display JSP in ISO-8859-1 encoding (Western). I then add the following code in JSP:
  1. <%@page contentType="text/html;charset=big5"%>
複製代碼
This time, the browser tries to display JSP in Big5 encoding, but the characters still cannot be read.



I also tried to put the JSP page directive tag only, but it doesn't work (just like the above image).

Is it due to the encoding of Java compiler? Or other reasons?

Please help and thank you.

[ 本帖最後由 GraphiteCube 於 2009-6-25 22:56 編輯 ]
作者: xeon0541    時間: 2009-6-25 17:36

first  make sure the encoding format of data  is correct
second  try this
<%@ page contentType="text/html; charset=big5"%>

or using  utf-8 encoding

[ 本帖最後由 xeon0541 於 2009-6-25 17:40 編輯 ]
作者: GraphiteCube    時間: 2009-6-25 18:06     標題: 回覆 2# 的帖子

I added it into JSP but it doesn't help.

Character encoding is annoying...
作者: xeon0541    時間: 2009-6-25 20:11

seem encoding format of data is incorrect  , you need to converting the encoding format of  data ,for example:
String rs="你我";
String col_1_b=rs getBytes("8859_1");
String a=new String(col_1_b,"big5");

[ 本帖最後由 xeon0541 於 2009-6-25 20:12 編輯 ]
作者: patrickit    時間: 2009-6-25 21:11

原帖由 xeon0541 於 2009-6-25 20:11 發表
seem encoding format of data is incorrect  , you need to converting the encoding format of  data ,for example:
String rs="你我";
String col_1_b=rs getBytes("8859_1");
String a=new String(col_1_b,"big ...

唔使咁做架....
作者: xeon0541    時間: 2009-6-25 21:45

just simple example
作者: Tin852    時間: 2009-6-25 21:48

You said the source page is using Big5, as all string and character in Java are in unicode, so u need to either
- convert the source page content to unicode 1st by assigning correct encoding when you are reading text and store as String
or
- use byte[] all the way.
Otherwise the character encoding will be a mess.
If you are using String, u can use either unicode or big5 for pageEncoding attribute in <%page %> tag
or if you are using byte[], you may need to print it out directly using response.


作者: GraphiteCube    時間: 2009-6-25 22:02     標題: 回覆 5# 的帖子

Could you suggest any solutions?
作者: GraphiteCube    時間: 2009-6-25 22:05     標題: 回覆 7# 的帖子

If I want to convert the fetched page content from Big5 encoding to UTF-8 encoding, could getBytes() method in class String help?

Thanks.
作者: Tin852    時間: 2009-6-25 22:16     標題: 回覆 9# 的帖子

You don't need to use getBytes(), just set the correct encoding when fetching from source, e.g. InputStreamReader(InputStream in, String charsetName)
will do.
Then you will get the content in Unicode(Java internal encoding), then you just need to specify the output encoding using pageEncoding attribute in <%page%>, JspWriter will do all the convertion for you.
作者: GraphiteCube    時間: 2009-6-25 22:55

Thanks everyone, I have solved the problem. The following are part of my JSP codes, hopefully useful for others.

In a tag library handler class (I don't like embed business logics in JSP, so I chose tag library):
  1. String rawCode;
  2.         try {
  3.             while ((rawCode = buffReader.readLine()) != null) {
  4.                 // Only print useful HTML codes
  5.                 if (rawCode.indexOf("someImportantText") >= 0) {
  6.                     // Convert fetched HTML code to Big5 characters
  7.                     StringBuffer html = new StringBuffer(new String(rawCode.getBytes(), "Big5"));
  8.                     out.println(html);
  9.                     out.println("
  10. ");
  11.                 }
  12.             }
  13.         } catch (java.io.IOException e) {
  14.         }
複製代碼
In JSP:
  1. <%@page pageEncoding="Big5"%>

  2. <html xmlns="http://www.w3.org/1999/xhtml">
  3.     <head>
  4.         <meta http-equiv="Content-Type" content="text/html; charset=Big5" />
複製代碼

作者: henrywho    時間: 2009-6-26 23:20

Are "rawCode" containing the right bytes itself, say those chinese characters in UTF-16 format?  Try printing the hex code of its bytes.

Also, does this solution work for those HK supplementary characters?
作者: GraphiteCube    時間: 2009-6-27 00:41     標題: 回覆 12# 的帖子

rawCodes contain characters encoded in Big5.

I haven't try it for characters in HKSCS.
作者: jzu    時間: 2009-6-27 02:05

it's better to convert the raw contents to utf-8
so that it can be shown by your webpage correctly

[ 本帖最後由 jzu 於 2009-6-27 02:06 編輯 ]
作者: GraphiteCube    時間: 2009-6-27 02:54

Argh! It seems I know what happened.

Before I print the HTML, I should decode the fetched content with Big5 character set first.
  1. // Extract useful data from HTML codes
  2.                     String html = this.extractData(new StringBuffer(rawCode));
  3.                     // Decode fetched HTML with Big5 character set
  4.                     out.println(new String(html.getBytes(), "Big5"));
複製代碼
This will decode the fetched HTML and convert it into UTF-8 format.

So now I can put the following in JSP:
  1. <%@page pageEncoding="UTF-8"%>
  2. ...
  3. <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
複製代碼
Now the browser will and able to browse the JSP with UTF-8 encoding.

P.S. 希望呢個方法可以連我手上另一個有關XML file的Encoding問題都解決到。

[ 本帖最後由 GraphiteCube 於 2009-6-27 02:56 編輯 ]





歡迎光臨 電腦領域 HKEPC Hardware (https://www.hkepc.com/forum/) Powered by Discuz! 7.2