Detecting The Encoding Of A Text File

Nitsan Bin-Nun <nitsanbn@xxxxxxxxx> · Thu, 26 Nov 2009 06:55:31 +0200

Hi,

I have been trying for the last couple of hours to determine the
encoding of a text file (.txt in windowz).

I have this code:

        $contents = file_get_contents($config['
txt_dir'] . $file);
        $encoding = mb_detect_encoding($contents,
"UTF-8,ISO-8859-1,WINDOWS-1252"); //,Windows-1255

        echo "||encoding:".$encoding."||";

        if ($encoding == 'UTF-8')
        {
            $utfcontents = $contents;
        }
        else if ($encoding == 'ISO-8859-1')
        {
            $utfcontents = utf8_encode($contents);
        }

        var_dump($utfcontents);

The $encoding is ISO-8859-1, the text file contains Hebrew characters, then
I'm converting it to utf8.

The above code is outputing gibbrish, it seems that it has converted it in
some way but not in the
proper way that it should have converted it.

My page is UTF-8 encoded, without BOM, I send UTF-8 headers to the browser
and HTML content
encoding meta tag as well.

I have no idea what I am doing wrong.

I would highly appreciate it if someone could point me to the right
direction.

Thanks in Advance,

Nitsan