Basics of Japanese multi-byte encodings
Japanese characters can only be represented by multibyte encodings, and multiple encoding standards are used depending on platform and text purpose. To make matters worse, these encoding standards differ slightly from one another. In order to create a web application which would be usable in a Japanese environment, a developer has to keep these complexities in mind to ensure that the proper character encodings are used.
- Storage for a character can be up to six bytes
- Most Japanese multibyte characters appear twice as wide as single-byte characters. These characters are called "zen-kaku" in Japanese, which means "full width". Other, narrower, characters are called "han-kaku", which means "half width". The graphical properties of the characters, however, depends upon the type faces used to display them.
- Some character encodings use shift(escape) sequences defined in ISO-2022 to switch the code map of the specific code area (00h to 7fh).
- ISO-2022-JP should be used in SMTP/NNTP, and headers and entities should be reencoded as per RFC requirements. Although those are not requisites, it's still a good idea because several popular user agents cannot recognize any other encoding methods.
- Web pages created for mobile phone services such as » i-mode, » Vodafone live!, or » EZweb are supposed to use Shift_JIS.
- PHP Руководство
- Функции по категориям
- Индекс функций
- Справочник функций
- Поддержка языков и кодировок
- Введение
- Установка и настройка
- Предопределенные константы
- Краткий список поддерживаемых кодировок
- Основы Японских многобайтных кодировок
- HTTP Ввод и Вывод
- Поддерживаемые кодировки символов
- Механизм перегрузки функций
- Требования, предъявляемые к кодировкам символов в PHP
- Функции для работы с Многобайтными строками
Коментарии
For ISO-2022-JP encoding. If you convert data into this encoding it is highly recommended you use ISO-2022-JP-MS for the extended character set, eg the 1 in a circle ①.
For the mail header on the other hand you have to use ISO-2022-JP *without* the -MS extension.