문자 세트 인코딩 표시 표시


import java.nio.ByteBuffer;
import java.nio.charset.Charset;

/**
 * Charset encoding test.  Run the same input string, which contains
 * some non-ascii characters, through several Charset encoders and dump out
 * the hex values of the resulting byte sequences.
 */
public class DecodeTest {
    public static void main(String[] args) {
        // This is the character sequence to encode 
        String input = "\u00bfMa\u00f1ana?";
        String [] charsetNames = {
                "US-ASCII", "ISO-8859-1", "UTF-8", "UTF-16BE",
                "UTF-16LE", "UTF-16"
        };
        for (int i = 0; i < charsetNames.length; i++) {
            doEncode (Charset.forName(charsetNames[i]), input);
        }
    }

    private static void doEncode(Charset cs, String input) {
        ByteBuffer bb = cs.encode(input);
        System.out.println("Charset: " + cs.name());
        System.out.println("  input :" + input);
        System.out.println("Encoded: " );
        for (int i = 0; bb.hasRemaining(); i++) {
            int b = bb.get();
            int ival = ((int) b) & 0xff;
            char c = (char) ival;
            // Keep tabular alignment pretty
            if (i < 10) System.out.print(" ");
            // Print index number
            System.out.print("  " + i + ": ");
            // Better formatted output is coming someday...
            if (ival < 16) System.out.print("0");
            // Print the hex value of  the byte
            System.out.print(Integer.toHexString(ival));
            // If the byte seems to be the value of a
            // printable character, print it.  No guarantee
            // it will be.
            if (Character.isWhitespace(c) || Character.isISOControl(c)) {
                System.out.println("");
            } else {
                System.out.println(" (" + c + ")");
            }
        }
        System.out.println("");
    }
}

결과 내보내기


Charset: US-ASCII
  input :¿Mañana?
Encoded: 
   0: 3f (?)
   1: 4d (M)
   2: 61 (a)
   3: 3f (?)
   4: 61 (a)
   5: 6e (n)
   6: 61 (a)
   7: 3f (?)

Charset: ISO-8859-1
  input :¿Mañana?
Encoded: 
   0: bf (¿)
   1: 4d (M)
   2: 61 (a)
   3: f1 (ñ)
   4: 61 (a)
   5: 6e (n)
   6: 61 (a)
   7: 3f (?)

Charset: UTF-8
  input :¿Mañana?
Encoded: 
   0: c2 (Â)
   1: bf (¿)
   2: 4d (M)
   3: 61 (a)
   4: c3 (Ã)
   5: b1 (±)
   6: 61 (a)
   7: 6e (n)
   8: 61 (a)
   9: 3f (?)

Charset: UTF-16BE
  input :¿Mañana?
Encoded: 
   0: 00
   1: bf (¿)
   2: 00
   3: 4d (M)
   4: 00
   5: 61 (a)
   6: 00
   7: f1 (ñ)
   8: 00
   9: 61 (a)
  10: 00
  11: 6e (n)
  12: 00
  13: 61 (a)
  14: 00
  15: 3f (?)

Charset: UTF-16LE
  input :¿Mañana?
Encoded: 
   0: bf (¿)
   1: 00
   2: 4d (M)
   3: 00
   4: 61 (a)
   5: 00
   6: f1 (ñ)
   7: 00
   8: 61 (a)
   9: 00
  10: 6e (n)
  11: 00
  12: 61 (a)
  13: 00
  14: 3f (?)
  15: 00

Charset: UTF-16
  input :¿Mañana?
Encoded: 
   0: fe (þ)
   1: ff (ÿ)
   2: 00
   3: bf (¿)
   4: 00
   5: 4d (M)
   6: 00
   7: 61 (a)
   8: 00
   9: f1 (ñ)
  10: 00
  11: 61 (a)
  12: 00
  13: 6e (n)
  14: 00
  15: 61 (a)
  16: 00
  17: 3f (?)

UTF-16BE 및 UTF-16LE는 각 문자를 2바이트 값으로 인코딩합니다.따라서 이런 인코딩의 디코더는 반드시
데이터가 어떻게 인코딩되는지 미리 알고 인코딩 데이터 흐름 자체에 따라 바이트 순서를 정하는 방식을 알아야 한다.UTF -16
인코딩은 바이트 순서 태그를 인정합니다: Unicode 문자\uFF.인코딩 흐름이 시작될 때만 바이트 순서
표기야말로 그 특수한 의미로 나타난다.나중에 값이 발생하면 Unicode 값(0 너비,
띄엄필렛 공백 없음) 이 매핑됩니다.외부 바이트 순서 시스템은 우선적으로\uFEF를 고려하고 흐름을
UTF -16LE.UTF-16 인코딩 우선 순위 및 승인 바이트 순서 표시를 사용하여 시스템이 서로 다른 내부 바이트 순서를 가지도록 합니다.
순서, 유니코드 데이터와 교류
UTF-16BE
바이트 태그 없음, 인코딩 하이퍼텍스트
UTF-16LE
바이트 태그 없음, 인코딩 하위 순서
더 많은 정보는 orelly가 출판한 자바니오 6장을 참고하십시오.

문자 세트 인코딩 표시 표시

좋은 웹페이지 즐겨찾기