mb_convert_encoding

(PHP 4 >= 4.0.6, PHP 5)

mb_convert_encoding — Convert character encoding

Описание

string mb_convert_encoding ( string $str , string $to_encoding [, mixed $from_encoding ] )

Converts the character encoding of string str to to_encoding from optionally from_encoding .

Список параметров

str

The string being encoded.

to_encoding

The type of encoding that str is being converted to.

from_encoding

Is specified by character code names before conversion. It is either an array, or a comma separated enumerated list. If from_encoding is not specified, the internal encoding will be used.

"auto" may be used, which expands to "ASCII,JIS,UTF-8,EUC-JP,SJIS".

Возвращаемые значения

The encoded string.

Примеры

Пример #1 mb_convert_encoding() example


<?php
/* Convert internal character encoding to SJIS */
$str = mb_convert_encoding($str, "SJIS");

/* Convert EUC-JP to UTF-7 */
$str = mb_convert_encoding($str, "UTF-7", "EUC-JP");

/* Auto detect encoding from JIS, eucjp-win, sjis-win, then convert str to UCS-2LE */
$str = mb_convert_encoding($str, "UCS-2LE", "JIS, eucjp-win, sjis-win");

/* "auto" is expanded to "ASCII,JIS,UTF-8,EUC-JP,SJIS" */
$str = mb_convert_encoding($str, "EUC-JP", "auto");
?>

Смотрите также

mb_detect_order()

Коментарии

Feb 07

Автор: lanka at eurocom dot od dot ua


Another sample of recoding without MultiByte enabling.

(Russian koi->win, if input in win-encoding already, function recode() returns unchanged string)



<?php

  // 0 - win

  // 1 - koi

  function detect_encoding($str) {

    $win = 0;

    $koi = 0;



    for($i=0; $i<strlen($str); $i++) {

      if( ord($str[$i]) >224 && ord($str[$i]) < 255) $win++;

      if( ord($str[$i]) >192 && ord($str[$i]) < 223) $koi++;

    }



    if( $win < $koi ) {

      return 1;

    } else return 0;



  }



  // recodes koi to win

  function koi_to_win($string) {



    $kw = array(128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183,  184, 185, 186, 187, 188, 189, 190, 191, 254, 224, 225, 246, 228, 229, 244, 227, 245, 232, 233, 234, 235, 236, 237, 238, 239, 255, 240, 241, 242, 243, 230, 226, 252, 251, 231, 248, 253, 249, 247, 250, 222, 192, 193, 214, 196, 197, 212, 195, 213, 200, 201, 202, 203, 204, 205, 206, 207, 223, 208, 209, 210, 211, 198, 194, 220, 219, 199, 216, 221, 217, 215, 218);

    $wk = array(128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183,  184, 185, 186, 187, 188, 189, 190, 191, 225, 226, 247, 231, 228, 229, 246, 250, 233, 234, 235, 236, 237, 238, 239, 240, 242,  243, 244, 245, 230, 232, 227, 254, 251, 253, 255, 249, 248, 252, 224, 241, 193, 194, 215, 199, 196, 197, 214, 218, 201, 202, 203, 204, 205, 206, 207, 208, 210, 211, 212, 213, 198, 200, 195, 222, 219, 221, 223, 217, 216, 220, 192, 209);



    $end = strlen($string);

    $pos = 0;

    do {

      $c = ord($string[$pos]);

      if ($c>128) {

        $string[$pos] = chr($kw[$c-128]);

      }



    } while (++$pos < $end);



    return $string;

  }



  function recode($str) {



    $enc = detect_encoding($str);

    if ($enc==1) {

      $str = koi_to_win($str);

    }



    return $str;

  }

?>

2003-02-07 10:03:56

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Feb 01

Автор: jamespilcher1 - hotmail


be careful when converting from iso-8859-1 to utf-8.



even if you explicitly specify the character encoding of a page as iso-8859-1(via headers and strict xml defs), windows 2000 will ignore that and interpret it as whatever character set it has natively installed. 



for example, i wrote char #128 into a page, with char encoding iso-8859-1, and it displayed in internet explorer (& mozilla) as a euro symbol.



it should have displayed a box, denoting that char #128 is undefined in iso-8859-1. The problem was it was displaying in "Windows: western europe" (my native character set).



this led to confusion when i tried to convert this euro to UTF-8 via mb_convert_encoding()  



IE displays UTF-8 correctly- and because PHP correctly converted #128 into a box in UTF-8, IE would show a box.



so all i saw was mb_convert_encoding() converting a euro symbol into a box. It took me a long time to figure out what was going on.

2004-02-01 21:55:57

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Sep 09

Автор: Stephan van der Feest


Here's a tip for anyone using Flash and PHP for storing HTML output submitted from a Flash text field in a database or whatever.



Flash submits its HTML special characters in UTF-8, so you can use the following function to convert those into HTML entity characters:



function utf8html($utf8str)

{

  return htmlentities(mb_convert_encoding($utf8str,"ISO-8859-1","UTF-8"));

}

2005-09-09 06:50:54

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Sep 09

Автор: Stephan van der Feest


To add to the Flash conversion comment below, here's how I convert back from what I've stored in a database after converting from Flash HTML text field output, in order to load it back into a Flash HTML text field:



function htmltoflash($htmlstr)

{

  return str_replace("&lt;br /&gt;","\n",

    str_replace("<","&lt;",

      str_replace(">","&gt;",

        mb_convert_encoding(html_entity_decode($htmlstr),

        "UTF-8","ISO-8859-1"))));

}

2005-09-09 07:47:41

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Nov 11

Автор: Tom Class


Why did you use the php html encode functions? mbstring has it's own Encoding which is (as far as I tested it) much more usefull:



HTML-ENTITIES



Example:



$text = mb_convert_encoding($text, 'HTML-ENTITIES', "UTF-8");

2005-11-11 09:35:53

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Feb 20

Автор: eion at bigfoot dot com


many people below talk about using 

<?php

    mb_convert_encode($s,'HTML-ENTITIES','UTF-8');

?>

to convert non-ascii code into html-readable stuff.  Due to my webserver being out of my control, I was unable to set the database character set, and whenever PHP made a copy of my $s variable that it had pulled out of the database, it would convert it to nasty latin1 automatically and not leave it in it's beautiful UTF-8 glory.



So [insert korean characters here] turned into ?????.



I found myself needing to pass by reference (which of course is deprecated/nonexistent in recent versions of PHP)

so instead of

<?php

    mb_convert_encode(&$s,'HTML-ENTITIES','UTF-8');

?>

which worked perfectly until I upgraded, so I had to use

<?php

    call_user_func_array('mb_convert_encoding', array(&$s,'HTML-ENTITIES','UTF-8'));

?>



Hope it helps someone else out

2006-02-20 18:54:52

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Jul 08

Автор: mac.com@nemo


For those wanting to convert from $set to MacRoman, use iconv():



<?php



$string = iconv('UTF-8', 'macintosh', $string);



?>



('macintosh' is the IANA name for the MacRoman character set.)

2006-07-08 10:38:47

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Sep 05

Автор: phpdoc at jeudi dot de


I\&#039;d like to share some code to convert latin diacritics to their

traditional 7bit representation, like, for example,



- &agrave;,&ccedil;,&eacute;,&icirc;,... to a,c,e,i,...

- &szlig; to ss

- &auml;,&Auml;,... to ae,Ae,...

- &euml;,... to e,...



(mb_convert \&quot;7bit\&quot; would simply delete any offending characters). 



I might have missed on your country\&#039;s typographic 

conventions--correct me then. 

&lt;?php

/**

 * @args string $text line of encoded text

 *       string $from_enc (encoding type of $text, e.g. UTF-8, ISO-8859-1)

 *

 * @returns 7bit representation

 */

function to7bit($text,$from_enc) {

    $text = mb_convert_encoding($text,\&#039;HTML-ENTITIES\&#039;,$from_enc);

    $text = preg_replace(

        array(\&#039;/&szlig;/\&#039;,\&#039;/&amp;(..)lig;/\&#039;,

             \&#039;/&amp;([aouAOU])uml;/\&#039;,\&#039;/&amp;(.)[^;]*;/\&#039;),

        array(\&#039;ss\&#039;,\&quot;$1\&quot;,\&quot;$1\&quot;.\&#039;e\&#039;,\&quot;$1\&quot;),

        $text);

    return $text;

}   

?&gt;



Enjoy :-)

Johannes



==

[EDIT BY danbrown AT php DOT net: Author provided the following update on 27-FEB-2012.]

==



An addendum to my &quot;to7bit&quot; function referenced below in the notes. 

The function is supposed to solve the problem that some languages require a different 7bit rendering of special (umlauted) characters for sorting or other applications. For example, the German &szlig; ligature is usually written &quot;ss&quot; in 7bit context. Dutch &yuml; is typically rendered &quot;ij&quot; (not &quot;y&quot;). 



The original function works well with word (alphabet) character entities and I&#039;ve seen it used in many places. But non-word entities cause funny results:

E.g., &quot;&copy;&quot; is rendered as &quot;c&quot;, &quot;&shy;&quot; as &quot;s&quot; and &quot;&amp;rquo;&quot; as &quot;r&quot;. 

The following version fixes this by converting non-alphanumeric characters (also chains thereof) to &#039;_&#039;.



&lt;?php

/**

 * @args string $text line of encoded text

 *       string $from_enc (encoding type of $text, e.g. UTF-8, ISO-8859-1)

 *

 * @returns 7bit representation

 */

function to7bit($text,$from_enc) {

    $text = preg_replace(/W+/,&#039;_&#039;,$text);

    $text = mb_convert_encoding($text,&#039;HTML-ENTITIES&#039;,$from_enc);

    $text = preg_replace(

        array(&#039;/&szlig;/&#039;,&#039;/&amp;(..)lig;/&#039;,

             &#039;/&amp;([aouAOU])uml;/&#039;,&#039;/&yuml;/&#039;,&#039;/&amp;(.)[^;]*;/&#039;),

        array(&#039;ss&#039;,&quot;$1&quot;,&quot;$1&quot;.&#039;e&#039;,&#039;ij&#039;,&quot;$1&quot;),

        $text);

    return $text;

}  

?&gt;



Enjoy again,

Johannes

2006-09-05 09:46:41

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Dec 20

Автор: David Hull


As an alternative to Johannes's suggestion for converting strings from other character sets to a 7bit representation while not just deleting latin diacritics, you might try this:



<?php

$text = iconv($from_enc, 'US-ASCII//TRANSLIT', $text);

?>



The only disadvantage is that it does not convert "ä" to "ae", but it handles punctuation and other special characters better.

-- 

David

2006-12-20 12:52:40

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Aug 21

Автор: aofg


When converting Japanese strings to ISO-2022-JP or JIS on PHP >= 5.2.1, you can use "ISO-2022-JP-MS" instead of them.

Kishu-Izon (platform dependent) characters are converted correctly with the encoding, as same as with eucJP-win or with SJIS-win.

2007-08-21 21:49:55

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Sep 25

Автор: volker at machon dot biz


Hey guys. For everybody who's looking for a function that is converting an iso-string to utf8 or an utf8-string to iso, here's your solution:



public function encodeToUtf8($string) {

     return mb_convert_encoding($string, "UTF-8", mb_detect_encoding($string, "UTF-8, ISO-8859-1, ISO-8859-15", true));

}



public function encodeToIso($string) {

     return mb_convert_encoding($string, "ISO-8859-1", mb_detect_encoding($string, "UTF-8, ISO-8859-1, ISO-8859-15", true));

}



For me these functions are working fine. Give it a try

2007-09-25 00:05:34

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Jan 15

Автор: rodrigo at bb2 dot co dot jp


For those who can´t use mb_convert_encoding() to convert from one charset to another as a metter of lower version of php, try iconv().



I had this problem converting to japanese charset:



$txt=mb_convert_encoding($txt,'SJIS',$this->encode);



And I could fix it by using this:



$txt = iconv('UTF-8', 'SJIS', $txt);



Maybe it´s helpfull for someone else! ;)

2008-01-15 05:47:52

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Jan 25

Автор: katzlbtjunk at hotmail dot com


Clean a string for use as filename by simply replacing all unwanted characters with underscore (ASCII converts to 7bit). It removes slightly more chars than necessary. Hope its useful. 



$fileName = 'Test:!"$%&/()=ÖÄÜöäü<<';

echo strtr(mb_convert_encoding($fileName,'ASCII'), 

    ' ,;:?*#!§$%&/(){}<>=`´|\\\'"', 

    '____________________________');

2008-01-25 06:36:30

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

May 15

Автор: nospam at nihonbunka dot com


rodrigo at bb2 dot co dot jp wrote that inconv works better than mb_convert_encoding, I find that when converting from uft8 to shift_jis 

$conv_str = mb_convert_encoding($str,$toCS,$fromCS); 

works while

$conv_str = iconv($fromCS,$toCS.'//IGNORE',$str); 

removes tildes from $str.

2008-05-15 21:51:34

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Aug 13

Автор: StigC


For the php-noobs (like me) - working with flash and php.



Here's a simple snippet of code that worked great for me, getting php to show special Danish characters, from a Flash email form:



<?php

// Name Escape

$escName = mb_convert_encoding($_POST["Name"], "ISO-8859-1", "UTF-8");



// message escape

$escMessage = mb_convert_encoding($_POST["Message"], "ISO-8859-1", "UTF-8");



// Headers.. and so on...

?>

2008-08-13 18:38:47

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Nov 07

Автор: aaron at aarongough dot com


My solution below was slightly incorrect, so here is the correct version (I posted at the end of a long day, never a good idea!)



Again, this is a quick and dirty solution to stop mb_convert_encoding from filling your string with question marks whenever it encounters an illegal character for the target encoding. 



<?php

function convert_to ( $source, $target_encoding )

    {

    // detect the character encoding of the incoming file

    $encoding = mb_detect_encoding( $source, "auto" );

       

    // escape all of the question marks so we can remove artifacts from

    // the unicode conversion process

    $target = str_replace( "?", "[question_mark]", $source );

       

    // convert the string to the target encoding

    $target = mb_convert_encoding( $target, $target_encoding, $encoding);

       

    // remove any question marks that have been introduced because of illegal characters

    $target = str_replace( "?", "", $target );

       

    // replace the token string "[question_mark]" with the symbol "?"

    $target = str_replace( "[question_mark]", "?", $target );

   

    return $target;

    }

?>



Hope this helps someone! (Admins should feel free to delete my previous, incorrect, post for clarity)

-A

2008-11-07 10:24:46

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Nov 10

Автор: francois at bonzon point com


aaron, to discard unsupported characters instead of printing a ?, you might as well simply set the configuration directive:



mbstring.substitute_character = "none"



in your php.ini. Be sure to include the quotes around none. Or at run-time with



<?php

ini_set('mbstring.substitute_character', "none");

?>

2008-11-10 19:05:38

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Jan 05

Автор: chzhang at gmail dot com


instead of ini_set(), you can try this



mb_substitute_character("none");

2009-01-05 02:34:31

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Jun 18

Автор: me at gsnedders dot com


It appears that when dealing with an unknown "from encoding" the function will both throw an E_WARNING and proceed to convert the string from ISO-8859-1 to the "to encoding".

2009-06-18 18:06:42

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Jul 23

Автор: Daniel Trebbien


Note that `mb_convert_encoding($val, 'HTML-ENTITIES')` does not escape '\'', '"', '<', '>', or '&'.

2009-07-23 14:25:38

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

May 14

Автор: regrunge at hotmail dot it


I've been trying to find the charset of a norwegian (with a lot of ø, æ, å) txt file written on a Mac, i've found it in this way:



<?php

$text = "A strange string to pass, maybe with some ø, æ, å characters.";



foreach(mb_list_encodings() as $chr){

        echo mb_convert_encoding($text, 'UTF-8', $chr)." : ".$chr."<br>";    

 } 

?>



The line that looks good, gives you the encoding it was written in.



Hope can help someone

2010-05-14 11:00:29

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Aug 25

Автор: gullevek at gullevek dot org


If you want to convert japanese to ISO-2022-JP it is highly recommended to use ISO-2022-JP-MS as the target encoding instead. This includes the extended character set and avoids ? in the text. For example the often used "1 in a circle" ① will be correctly converted then.

2010-08-25 03:27:44

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Sep 11

Автор: urko at wegetit dot eu


If you are trying to generate a CSV (with extended chars) to be opened at Exel for Mac, the only that worked for me was:

<?php mb_convert_encoding( $CSV, 'Windows-1252', 'UTF-8'); ?>



I also tried this:



<?php

//Separado OK, chars MAL

iconv('MACINTOSH', 'UTF8', $CSV);

//Separado MAL, chars OK

chr(255).chr(254).mb_convert_encoding( $CSV, 'UCS-2LE', 'UTF-8');

?>



But the first one didn't show extended chars correctly, and the second one, did't separe fields correctly

2012-09-11 21:17:29

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Jun 28

Автор: josip at cubrad dot com


For my last project I needed to convert several CSV files from Windows-1250 to UTF-8, and after several days of searching around I found a function that is partially solved my problem, but it still has not transformed all the characters. So I made this:



function w1250_to_utf8($text) {

    // map based on:

    // http://konfiguracja.c0.pl/iso02vscp1250en.html

    // http://konfiguracja.c0.pl/webpl/index_en.html#examp

    // http://www.htmlentities.com/html/entities/

    $map = array(

        chr(0x8A) => chr(0xA9),

        chr(0x8C) => chr(0xA6),

        chr(0x8D) => chr(0xAB),

        chr(0x8E) => chr(0xAE),

        chr(0x8F) => chr(0xAC),

        chr(0x9C) => chr(0xB6),

        chr(0x9D) => chr(0xBB),

        chr(0xA1) => chr(0xB7),

        chr(0xA5) => chr(0xA1),

        chr(0xBC) => chr(0xA5),

        chr(0x9F) => chr(0xBC),

        chr(0xB9) => chr(0xB1),

        chr(0x9A) => chr(0xB9),

        chr(0xBE) => chr(0xB5),

        chr(0x9E) => chr(0xBE),

        chr(0x80) => '&euro;',

        chr(0x82) => '&sbquo;',

        chr(0x84) => '&bdquo;',

        chr(0x85) => '&hellip;',

        chr(0x86) => '&dagger;',

        chr(0x87) => '&Dagger;',

        chr(0x89) => '&permil;',

        chr(0x8B) => '&lsaquo;',

        chr(0x91) => '&lsquo;',

        chr(0x92) => '&rsquo;',

        chr(0x93) => '&ldquo;',

        chr(0x94) => '&rdquo;',

        chr(0x95) => '&bull;',

        chr(0x96) => '&ndash;',

        chr(0x97) => '&mdash;',

        chr(0x99) => '&trade;',

        chr(0x9B) => '&rsquo;',

        chr(0xA6) => '&brvbar;',

        chr(0xA9) => '&copy;',

        chr(0xAB) => '&laquo;',

        chr(0xAE) => '&reg;',

        chr(0xB1) => '&plusmn;',

        chr(0xB5) => '&micro;',

        chr(0xB6) => '&para;',

        chr(0xB7) => '&middot;',

        chr(0xBB) => '&raquo;',

    );

    return html_entity_decode(mb_convert_encoding(strtr($text, $map), 'UTF-8', 'ISO-8859-2'), ENT_QUOTES, 'UTF-8');

}

2013-06-28 02:24:17

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Nov 17

Автор: Daniel


If you are attempting to convert "UTF-8" text to "ISO-8859-1" and the result is always returning in "ASCII", place the following line of code before the mb_convert_encoding:



mb_detect_order(array('UTF-8', 'ISO-8859-1'));



It is necessary to force a specific search order for the conversion to work

2015-11-17 18:25:39

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Dec 28

Автор: nicole


// convert UTF8 to DOS = CP850 

//

// $utf8_text=UTF8-Formatted text;

// $dos=CP850-Formatted text;



// have fun



$dos = mb_convert_encoding($utf8_text, "CP850", mb_detect_encoding($utf8_text, "UTF-8, CP850, ISO-8859-15", true));

2015-12-28 14:37:06

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Jul 17

Автор: vasiliauskas dot agnius at gmail dot com


When you need to convert from HTML-ENTITIES, but your UTF-8 string is partially broken (not all chars in UTF-8) - in this case passing string to mb_convert_encoding($string, 'UTF-8', 'HTML-ENTITIES'); - corrupts chars in string even more. In this case you need to replace html entities gradually to preserve character good encoding. I wrote such closure for this job :

<?php

$decode_entities = function($string) {

        preg_match_all("/&#?\w+;/", $string, $entities, PREG_SET_ORDER);

        $entities = array_unique(array_column($entities, 0));

        foreach ($entities as $entity) {

            $decoded = mb_convert_encoding($entity, 'UTF-8', 'HTML-ENTITIES');

            $string = str_replace($entity, $decoded, $string);

        }

        return $string;

    };

?>

2018-07-17 11:16:24

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Feb 15

Автор: bmxmale at qwerty dot re


/**

 * Convert Windows-1250 to UTF-8

 * Based on https://www.php.net/manual/en/function.mb-convert-encoding.php#112547

 */

class TextConverter

{

    private const ENCODING_TO = 'UTF-8';

    private const ENCODING_FROM = 'ISO-8859-2';



    private array $mapChrChr = [

        0x8A => 0xA9,

        0x8C => 0xA6,

        0x8D => 0xAB,

        0x8E => 0xAE,

        0x8F => 0xAC,

        0x9C => 0xB6,

        0x9D => 0xBB,

        0xA1 => 0xB7,

        0xA5 => 0xA1,

        0xBC => 0xA5,

        0x9F => 0xBC,

        0xB9 => 0xB1,

        0x9A => 0xB9,

        0xBE => 0xB5,

        0x9E => 0xBE

    ];



    private array $mapChrString = [

        0x80 => '&euro;',

        0x82 => '&sbquo;',

        0x84 => '&bdquo;',

        0x85 => '&hellip;',

        0x86 => '&dagger;',

        0x87 => '&Dagger;',

        0x89 => '&permil;',

        0x8B => '&lsaquo;',

        0x91 => '&lsquo;',

        0x92 => '&rsquo;',

        0x93 => '&ldquo;',

        0x94 => '&rdquo;',

        0x95 => '&bull;',

        0x96 => '&ndash;',

        0x97 => '&mdash;',

        0x99 => '&trade;',

        0x9B => '&rsquo;',

        0xA6 => '&brvbar;',

        0xA9 => '&copy;',

        0xAB => '&laquo;',

        0xAE => '&reg;',

        0xB1 => '&plusmn;',

        0xB5 => '&micro;',

        0xB6 => '&para;',

        0xB7 => '&middot;',

        0xBB => '&raquo;'

    ];



    /**

     * @param $text

     * @return string

     */

    public function execute($text): string

    {

        $map = $this->prepareMap();



        return html_entity_decode(

            mb_convert_encoding(strtr($text, $map), self::ENCODING_TO, self::ENCODING_FROM),

            ENT_QUOTES,

            self::ENCODING_TO

        );

    }



    /**

     * @return array

     */

    private function prepareMap(): array

    {

        $maps[] = $this->arrayMapAssoc(function ($k, $v) {

            return [chr($k), chr($v)];

        }, $this->mapChrChr);



        $maps[] = $this->arrayMapAssoc(function ($k, $v) {

            return [chr($k), $v];

        }, $this->mapChrString);



        return array_merge([], ...$maps);

    }



    /**

     * @param callable $function

     * @param array $array

     * @return array

     */

    private function arrayMapAssoc(callable $function, array $array): array

    {

        return array_column(

            array_map(

                $function,

                array_keys($array),

                $array

            ),

            1,

            0

        );

    }

}

2022-02-15 12:00:24

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Aug 24

Автор: Rainer Perske


Text-encoding HTML-ENTITIES will be deprecated as of PHP 8.2.



To convert all non-ASCII characters into entities (to produce pure 7-bit HTML output), I was using:



<?php

echo mb_convert_encoding( htmlspecialchars( $text, ENT_QUOTES, 'UTF-8' ), 'HTML-ENTITIES', 'UTF-8' );

?>



I can get the identical result with:



<?php

echo mb_encode_numericentity( htmlentities( $text, ENT_QUOTES, 'UTF-8' ), [0x80, 0x10FFFF, 0, ~0], 'UTF-8' );

?>



The output contains well-known named entities for some often used characters and numeric entities for the rest.

2022-08-24 16:35:29

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

Nov 29

Автор: Julian Egelstaff


If you have what looks like ISO-8859-1, but it includes "smart quotes" courtesy of Microsoft software, or people cutting and pasting content from Microsoft software, then what you're actually dealing with is probably Windows-1252. Try this:



<?php

$cleanText = mb_convert_encoding($text, 'UTF-8', 'Windows-1252');

?>



The annoying part is that the auto detection (ie: the mb_detect_encoding function) will often think Windows-1252 is ISO-8859-1. Close, but no cigar. This is critical if you're then trying to do unserialize on the resulting text, because the byte count of the string needs to be perfect.

2022-11-29 09:15:21

http://php5.kiev.ua/manual/ru/function.mb-convert-encoding.html

mb_convert_case

mb_convert_kana

Multibyte String Функции

PHP Manual

PHP5

Для web разработчика

Jul 26
Функции. mb_convert_encoding() - Convert character encoding

mb_convert_encoding

Описание

Список параметров

Возвращаемые значения

Примеры

Смотрите также

Коментарии

PHP5

Для web разработчика

Jul 26Функции. mb_convert_encoding() - Convert character encoding

mb_convert_encoding

Описание

Список параметров

Возвращаемые значения

Примеры

Смотрите также

Коментарии

Jul 26
Функции. mb_convert_encoding() - Convert character encoding