Java Tutorial - Java Script : Character Sets

Java Tutorial - Java Script :

Character Sets

Character sets, which are offered in the java.nio.charset package, are a set of classes used to convert data between byte buffers and character buffers. The three main classes are
·         Charset—A Unicode character set with a different byte value for each different character in the set
·         Decoder—A class that transforms a series of bytes into a series of characters
·         Encoder—A class that transforms a series of characters into a series of bytes
Before you can perform any transformations between byte and character buffers, you must create a CharSet object that maps characters to their corresponding byte values. To create a character set, call the forName(String) static method of the Charset class, specifying the name of the set’s character encoding. Java includes support for six character encodings:
·         US-ASCII—The 128-character ASCII set that makes up the Basic Latin block of Unicode (also called ISO646-US)
·         ISO-8859-1—The 256-character ISO Latin Alphabet No. 1.a. character set (also called ISO-LATIN-1)
·         UTF-8—A character set that includes US-ASCII and the Universal Character Set (also called Unicode), a set comprising thousands of characters used in the world’s languages
·         UTF-16BE—The Universal Character Set represented as 16-bit characters with bytes stored in big endian byte order
·         UTF-16LE—The Universal Character Set represented as 16-bit characters with bytes stored in little endian byte order
·         UTF-16—The Universal Character Set represented as 16-bit characters with the order of bytes indicated by an optional byte-order mark
The following statement creates a Charset object for the ISO-8859-1 character set:
Charset isoset = Charset.forName(“ISO-8859-1”);
After you have a character set object, you can use it to create encoders and decoders. Call the object’s newDecoder() method to create a CharsetDecoder and the newEncoder() method to create an CharsetEncoder. To transform a byte buffer into a character buffer, call the decoder’s decode(ByteBuffer) method, which returns a CharBuffer containing the bytes transformed into characters. To transform a character buffer into a byte buffer, call the encoder’s encode(CharBuffer) method. A ByteBuffer is returned containing the byte values of the characters. The following statements convert a byte buffer called netBuffer into a character buffer
using the ISO-8859-1 character set:
Charset set = Charset.forName(“ISO-8859-1”);
CharsetDecoder decoder = set.newDecoder();
netBuffer.position(0);
CharBuffer netText = decoder.decode(netBuffer);