Re: Problems converting strings with 0 to integer

Richard Quadling <rquadling@xxxxxxxxx> · Thu, 4 Nov 2010 15:40:51 +0000

On 4 November 2010 15:31, robert mena <robert.mena@xxxxxxxxx> wrote:
> Hi Richard,
> I am not top posting. ÂI am just explaining other symptoms that may point to
> the cause since they may be the same and this is happening with the same
> file. ÂI'll try to get approval to release the file.
> Meanwhile, In your opinion what would be the safest way to read and explode
> (using \t) a text file encoded in UTF-8?
>
> On Thu, Nov 4, 2010 at 11:22 AM, Richard Quadling <rquadling@xxxxxxxxx>
> wrote:
>>
>> On 4 November 2010 15:11, robert mena <robert.mena@xxxxxxxxx> wrote:
>> > Hi,
>> > The core of the code is simply
>> > $fp = fopen('file.tab', 'rb');
>> > while(!feof($fp))
>> > {
>> > ÂÂ $line = fgets($fp);
>> > ÂÂ $data = explode("\t", $line);
>> > ÂÂ Â...
>> > }
>> > So I try to manipulate the $data[X]. ÂFor example $data[0] is supposed
>> > to be
>> > numeric so I Â$n = (int) $data[0]
>> > One other thing if the second column should contain a string. ÂIf I
>> > check
>> > the string visually it is correct but a if( $data[1] == 'stringX') Âis
>> > false
>> > even if in the file I can see this (and print those two)
>> > I even did a md5 of both and they are different.
>> > I seems to be an encoding issue. ÂIs it safe to use explode with utf8
>> > strings?
>> > I even tried this code but no match found (jst to replace the explode)
>> > $str = "abc æååã Â Âefg";
>> > $results = array();
>> > preg_match_all("/\t/u", $str, $results);
>> > var_dump($results[0]);
>> > On Thu, Nov 4, 2010 at 6:33 AM, Richard Quadling <rquadling@xxxxxxxxx>
>> > wrote:
>> >>
>> >> On 3 November 2010 21:42, Alexander Holodny
>> >> <alexander.holodny@xxxxxxxxx>
>> >> wrote:
>> >> > To exclude unexcepted behavior in case of wrongly formated input
>> >> > data,
>> >> > it would be much better to use such type-casting method:
>> >> > intval(ltrim(trim($inStr), '0'))
>> >> >
>> >> > 2010/11/3, Nicholas Kell <nick@xxxxxxxxxxxxxxxx>:
>> >> >>
>> >> >> On Nov 3, 2010, at 4:22 PM, robert mena wrote:
>> >> >>
>> >> >>> Hi,
>> >> >>>
>> >> >>> I have a text file (utf-8 encoded) which contains lines with
>> >> >>> numbers
>> >> >>> and
>> >> >>> text separated by \t. ÂI need to convert the numbers that contains
>> >> >>> 0
>> >> >>> (at
>> >> >>> left) to integers.
>> >> >>>
>> >> >>> For some reason one line that contains 00000002 is casted to 0
>> >> >>> instead
>> >> >>> of
>> >> >>> 2.
>> >> >>> Bellow the output of the cast (int) $field[0] Âwhere I get this
>> >> >>> from
>> >> >>> explode each line.
>> >> >>>
>> >> >>> 0 ï00000002
>> >> >>> 4 00000004
>> >> >>
>> >> >>
>> >> >>
>> >> >> My first guess is wondering how you are grabbing the strings from
>> >> >> the
>> >> >> file.
>> >> >> Seems to me like it would just drop the zeros on the left by
>> >> >> default.
>> >> >> Are
>> >> >> you including the \t in the string by accident? If so, that may be
>> >> >> hosing
>> >> >> it. Otherwise, have you tried ltrim on it?
>> >> >>
>> >> >> Ex:
>> >> >>
>> >> >> $_castableString = ltrim($_yourString, '0');
>> >> >>
>> >> >> // Now cast
>> >>
>> >> <?php
>> >> // Create test file.
>> >> $s_TabbedFilename = './test.tab';
>> >> file_put_contents($s_TabbedFilename, "0\t00000002" . PHP_EOL .
>> >> "4\t00000004" . PHP_EOL);
>> >>
>> >> // Open test file.
>> >> $fp_TabbedFile = fopen($s_TabbedFilename, 'rt') or die("Could not open
>> >> {$s_TabbedFilename}\n");
>> >>
>> >> // Iterate file.
>> >> while(True)
>> >> Â Â Â Â{
>> >> Â Â Â Âif (False !== ($a_Line = fgetcsv($fp_TabbedFile, 0, "\t")))
>> >> Â Â Â Â Â Â Â Â{
>> >> Â Â Â Â Â Â Â Âvar_dump($a_Line);
>> >> Â Â Â Â Â Â Â Âforeach($a_Line as $i_Index => $m_Value)
>> >> Â Â Â Â Â Â Â Â Â Â Â Â{
>> >> Â Â Â Â Â Â Â Â Â Â Â Â$a_Line[$i_Index] = intval($m_Value);
>> >> Â Â Â Â Â Â Â Â Â Â Â Â}
>> >> Â Â Â Â Â Â Â Âvar_dump($a_Line);
>> >> Â Â Â Â Â Â Â Â}
>> >> Â Â Â Âelse
>> >> Â Â Â Â Â Â Â Â{
>> >> Â Â Â Â Â Â Â Âbreak;
>> >> Â Â Â Â Â Â Â Â}
>> >> Â Â Â Â}
>> >>
>> >> // Close the file.
>> >> fclose($fp_TabbedFile);
>> >>
>> >> // Delete the file.
>> >> unlink($s_TabbedFilename);
>> >>
>> >>
>> >> outputs ...
>> >>
>> >> array(2) {
>> >> Â[0]=>
>> >> Âstring(1) "0"
>> >> Â[1]=>
>> >> Âstring(8) "00000002"
>> >> }
>> >> array(2) {
>> >> Â[0]=>
>> >> Âint(0)
>> >> Â[1]=>
>> >> Âint(2)
>> >> }
>> >> array(2) {
>> >> Â[0]=>
>> >> Âstring(1) "4"
>> >> Â[1]=>
>> >> Âstring(8) "00000004"
>> >> }
>> >> array(2) {
>> >> Â[0]=>
>> >> Âint(4)
>> >> Â[1]=>
>> >> Âint(4)
>> >> }
>> >>
>> >> intval() operates as standard on base 10, so no need to worry about
>> >> leading zeros' being thought of as base8/octal.
>> >>
>> >> What is your code? Can you reduce it to something as small like the
>> >> above to see if you can repeat the issue?
>>
>> Please don't top post.
>>
>>
>> With regards to utf-8 data, no, PHP is not unicode aware.
>>
>> If a multi-byte character is comprised of a 0x09 byte, then it will be
>> broken.
>>
>> Can you supply the file you are working on?
>>
>> b64encode it and drop it into a pastebin.
>>
>>
>> --
>> Richard Quadling
>> Twitter : EE : Zend
>> @RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY
>
>

I've not used it, but the mbstring extension has mb_split() - Split
multibyte string using regular expression

Whilst it probably isn't as performant as explode() or fgetcsv(), it
should work.

But I'm not an unicode expert and having a file I can test this
mechanism easily enough.

I'd be interested in knowing what output the code I produced outputs
when used in conjunction with your data.

-- 
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php