mb_decode_numericentity
(PHP 4 >= 4.0.6, PHP 5)
mb_decode_numericentity — Декодирует числовую HTML-ссылку в символ
Описание
string mb_decode_numericentity
( string
$str
, array $convmap
, string $encoding
)
Преобразует строку чисел string
str
в заданном блоке в символ.
Список параметров
Возвращаемые значения
Преобразованная строка string.
Примеры
Пример #1 Пример convmap
<?php
$convmap = array (
int start_code1, int end_code1, int offset1, int mask1,
int start_code2, int end_code2, int offset2, int mask2,
........
int start_codeN, int end_codeN, int offsetN, int maskN );
// Задайте значения Юникода для start_codeN и end_codeN
// Добавьте к значению offsetN и сложите побитово с maskN,
// затем преобразуйте результат в число.
?>
- PHP Руководство
- Функции по категориям
- Индекс функций
- Справочник функций
- Поддержка языков и кодировок
- Многобайтные строки
- mb_check_encoding
- mb_convert_case
- mb_convert_encoding
- mb_convert_kana
- mb_convert_variables
- mb_decode_mimeheader
- mb_decode_numericentity
- mb_detect_encoding
- mb_detect_order
- mb_encode_mimeheader
- mb_encode_numericentity
- mb_encoding_aliases
- mb_ereg_match
- mb_ereg_replace_callback
- mb_ereg_replace
- mb_ereg_search_getpos
- mb_ereg_search_getregs
- mb_ereg_search_init
- mb_ereg_search_pos
- mb_ereg_search_regs
- mb_ereg_search_setpos
- mb_ereg_search
- mb_ereg
- mb_eregi_replace
- mb_eregi
- mb_get_info
- mb_http_input
- mb_http_output
- mb_internal_encoding
- mb_language
- mb_list_encodings
- mb_output_handler
- mb_parse_str
- mb_preferred_mime_name
- mb_regex_encoding
- mb_regex_set_options
- mb_send_mail
- mb_split
- mb_strcut
- mb_strimwidth
- mb_stripos
- mb_stristr
- mb_strlen
- mb_strpos
- mb_strrchr
- mb_strrichr
- mb_strripos
- mb_strrpos
- mb_strstr
- mb_strtolower
- mb_strtoupper
- mb_strwidth
- mb_substitute_character
- mb_substr_count
- mb_substr
Коментарии
Just two great functions for daily use:
/* Converts any HTML-entities into characters */
function my_numeric2character($t)
{
$convmap = array(0x0, 0x2FFFF, 0, 0xFFFF);
return mb_decode_numericentity($t, $convmap, 'UTF-8');
}
/* Converts any characters into HTML-entities */
function my_character2numeric($t)
{
$convmap = array(0x0, 0x2FFFF, 0, 0xFFFF);
return mb_encode_numericentity($t, $convmap, 'UTF-8');
}
print my_numeric2character('’ ἀ â');
print my_character2numeric(' ');
Here are functions to convert hankaku to zenkaku characters (and vice-versa) in Japanese text.
<?php
// Supported characters:
// (space)
// !#$%&()*+,./0123456789:;<=>?@
// ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`
// abcdefghijklmnopqrstuvwxyz{|}
// (Katakana isn't supported.)
function f_han2zen ($string,$encoding = null) {
if (is_null($encoding)) $encoding = mb_internal_encoding();
$convmap = array(
0x20,0x20,0x3000-0x20,0xffff, // Space
0x21,0x7e,0xff01-0x21,0xffff);
$temp = mb_encode_numericentity($string,$convmap,$encoding);
$convmap = array(0,0xffff,0,0xffff);
return mb_decode_numericentity($temp,$convmap,$encoding);
}
function f_zen2han ($string,$encoding = null) {
if (is_null($encoding)) $encoding = mb_internal_encoding();
$convmap = array(
0x3000,0x3000,-(0x3000-0x20),0xffff, // Space
0xff01,0xff5e,-(0xff01-0x21),0xffff);
$temp = mb_encode_numericentity($string,$convmap,$encoding);
$convmap = array(0,0xffff,0,0xffff);
return mb_decode_numericentity($temp,$convmap,$encoding);
}
// Sample usage:
f_han2zen("test","shift_jis");
f_han2zen("test","utf-8");
?>
Many web browsers will tend upload high order characters as UTF-8 encoded entities.
Here is some simple code to convert UTF-8 HTML entities within a block of text into proper characters:
<?php
//decode decimal HTML entities added by web browser
$body = preg_replace('/&#\d{2,5};/ue', "utf8_entity_decode('$0')", $body );
//decode hex HTML entities added by web browser
$body = preg_replace('/&#x([a-fA-F0-7]{2,8});/ue', "utf8_entity_decode('&#'.hexdec('$1').';')", $body );
//callback function for the regex
function utf8_entity_decode($entity){
$convmap = array(0x0, 0x10000, 0, 0xfffff);
return mb_decode_numericentity($entity, $convmap, 'UTF-8');
}
?>
By use of function utf8_decode you'll get a problem with all extended chars above ISO-8859-1 charset. You can solve this problem by using the
function mb_encode_numericentity before:
// convert $text from UTF-8 to ISO-8859-1
$convmap = array(0xFF, 0x2FFFF, 0, 0xFFFF);
$text = mb_encode_numericentity($text, $convmap, "UTF-8");
$text = utf8_decode($text);
The second line encodes all extended chars below 0xFF, the third line converts the rest: 0x80 - 0xFF
note that at this time it seems that mb_decode_numericentity() only works with decimal entities and not hexadecimal entities. This fact would have saved me a good hour of time in debugging.
For those who need to convert hex entities try first converting them all to decimal entities with a combination of the preg_replace() and hexdec() functions.
Manual entity => utf8 conversion:
<?php
// parse entities
$raw = preg_replace_callback
(
"/&#(\\d+);/u",
"_pcreEntityToUtf",
$raw
);
function _pcreEntityToUtf($matches)
{
$char = intval(is_array($matches) ? $matches[1] : $matches);
if ($char < 0x80)
{
// to prevent insertion of control characters
if ($char >= 0x20) return htmlspecialchars(chr($char));
else return "&#$char;";
}
else if ($char < 0x8000)
{
return chr(0xc0 | (0x1f & ($char >> 6))) . chr(0x80 | (0x3f & $char));
}
else
{
return chr(0xe0 | (0x0f & ($char >> 12))) . chr(0x80 | (0x3f & ($char >> 6))). chr(0x80 | (0x3f & $char));
}
}
?>