Re: removing text from a string

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thodoris wrote:
> 
>> Thodoris wrote:
>>  
>>>> Thodoris wrote:
>>>>  
>>>>      
>>>>>> Boyd, Todd M. wrote:
>>>>>>  
>>>>>>               
>>>>>>>> -----Original Message-----
>>>>>>>> From: Ashley Sheridan [mailto:ash@xxxxxxxxxxxxxxxxxxxx]
>>>>>>>> Sent: Tuesday, November 04, 2008 1:40 PM
>>>>>>>> To: Adam Williams
>>>>>>>> Cc: PHP General list
>>>>>>>> Subject: Re:  removing text from a string
>>>>>>>>
>>>>>>>> On Tue, 2008-11-04 at 08:04 -0600, Adam Williams wrote:
>>>>>>>>                            
>>>>>>>>> I have a file that looks like:
>>>>>>>>>
>>>>>>>>> 1. Some Text here
>>>>>>>>> 2. Another Line of Text
>>>>>>>>> 3. Yet another line of text
>>>>>>>>> 340. All the way to number 340
>>>>>>>>>
>>>>>>>>> And I want to remove the Number, period, and blank space at the
>>>>>>>>>                                     
>>>>>>>> begining
>>>>>>>>                            
>>>>>>>>> of each line.  How can I accomplish this?
>>>>>>>>>
>>>>>>>>> Opening the file to modify it is easy, I'm just lost at how to
>>>>>>>>>                                     
>>>>>>> remove
>>>>>>>                      
>>>>>>>>> the text.:
>>>>>>>>>
>>>>>>>>> <?php
>>>>>>>>> $filename = "results.txt";
>>>>>>>>>
>>>>>>>>> $fp = fopen($filename, "r") or die ("Couldn't open $filename");
>>>>>>>>> if ($fp)
>>>>>>>>> {
>>>>>>>>> while (!feof($fp))
>>>>>>>>>         {
>>>>>>>>>         $thedata = fgets($fp);
>>>>>>>>>         //Do something to remove the "1. "
>>>>>>>>>         //print the modified line and \n
>>>>>>>>>         }
>>>>>>>>> fclose($fp);
>>>>>>>>> }
>>>>>>>>> ?>
>>>>>>>>>
>>>>>>>>>                                     
>>>>>>>> I'd go with a regular expression any day for something like this.
>>>>>>>> Something like:
>>>>>>>>
>>>>>>>> "/$[0-9]{1,3}\.\ .*^/g"
>>>>>>>>
>>>>>>>> should do what you need. Note the space before the last period.
>>>>>>>>                               
>>>>>>> That would only work for files with 1-999 lines, and will wind up
>>>>>>> matching the entire line (since you used $ and ^ and a greedy .*
>>>>>>> inbetween... also... $ is "end-of-line" and ^ is
>>>>>>> "beginning-of-line" :))
>>>>>>> rather than just the "line number" part. I would stick with my
>>>>>>> originally-posted regex ("/^\d+\.\s/"), but I would modify yours
>>>>>>> like
>>>>>>> this if I were to use it instead:
>>>>>>>
>>>>>>> "/^[0-9]+\.\ (.*)$/" (What was the "g" modifier for, anyway?)
>>>>>>>
>>>>>>> Then, you could grab the capture group made with (.*) and use it as
>>>>>>> the
>>>>>>> "clean" data. (It would be group 1 in the match results and "$1"
>>>>>>> in a
>>>>>>> preg_replace() call, I believe. Group 0 should be the entire match.)
>>>>>>>
>>>>>>>
>>>>>>> Todd Boyd
>>>>>>> Web Programmer
>>>>>>>
>>>>>>>                         
>>>>>> Personally, I would go this route if you wanted to stick with a
>>>>>> regex.
>>>>>>
>>>>>> <?php
>>>>>>
>>>>>> $lines[] = '01. asdf';
>>>>>> $lines[] = '02. 323 asdf';
>>>>>> $lines[] = '03.2323 asdf';
>>>>>> $lines[] = '04. asdf 23';
>>>>>> $lines[] = '05.        asdf'; /* tabs used here */
>>>>>> $lines[] = '06. asdf';
>>>>>>
>>>>>> foreach ( $lines AS $line ) {
>>>>>>     echo preg_replace('/^[0-9]+\.\s*/', '', $line), "\n";
>>>>>> }
>>>>>>
>>>>>> ?>
>>>>>>
>>>>>> This takes care of all possible issues related to the char after the
>>>>>> first period.  Maybe it is there maybe not.
>>>>>>
>>>>>> Could be that it is a tab and not a space.  Could even be multiple
>>>>>> tabs or spaces.
>>>>>>
>>>>>>                   
>>>>> There it goes again.
>>>>>
>>>>> Every time someone asks a simple question (like the kind it's solved
>>>>> with a simple trim, ltrim or rtrim) the discussion about which is the
>>>>> best regular expression for this problem, makes a thread get
>>>>> "elephant"
>>>>> sized :-) .
>>>>>
>>>>> I love this list!!
>>>>>
>>>>>             
>>>> Your not going to be able to get it with any of the xtrim()
>>>> functions.  you would end up with various nested ltrim() calls that,
>>>> IMO, would be a
>>>> nightmare to manage.
>>>>
>>>> So, a top to bottom comparison here
>>>>
>>>> If $line is this:
>>>> $line = '01. asdf';
>>>>
>>>> And you use either one of these:
>>>> A) ltrim(ltrim(ltrim($line, '0123456789'), '.'));
>>>> B) preg_replace('/^[0-9]+\.\s*/', '', $line);
>>>>
>>>> Which do you prefer?
>>>>
>>>> A's Pros:
>>>>     Not a regex
>>>> A's Cons:
>>>>     A little slower then B
>>>>     multiple function calls
>>>>
>>>> B's Pros:
>>>>     Slightly faster then A
>>>>     Single Function call
>>>> B's Cons:
>>>>     Regex
>>>>
>>>>
>>>>
>>>>         
>>> You should really check the manual again Jim. AFAIK ltrim doesn't remove
>>> a single character but as long they belong to the list they are all
>>> removed you just do:
>>>
>>> ltrim($line, '0123456789')
>>>
>>>
>>> and this does the job perfectly. So perhaps you need to reconsider your
>>> thoughts on this.
>>>
>>>     
>>
>> Maybe instead of saying "AFAIK", you should go and check it yourself. 
>> But obviously, since you didn't care to do it the first time around, I
>> will
>> supply the relevant parts for you and the list archive.  And I quote:
>>
>> Reference: http://us3.php.net/ltrim
>>
>> Under "Returned Values" section...
>>
>> Return Values
>>
>> This function returns a string with whitespace stripped from the
>> beginning of str . Without the second parameter, ltrim() will strip
>> these characters:
>>
>>     * " " (ASCII 32 (0x20)), an ordinary space.
>>     * "\t" (ASCII 9 (0x09)), a tab.
>>     * "\n" (ASCII 10 (0x0A)), a new line (line feed).
>>     * "\r" (ASCII 13 (0x0D)), a carriage return.
>>     * "\0" (ASCII 0 (0x00)), the NUL-byte.
>>     * "\x0B" (ASCII 11 (0x0B)), a vertical tab.
>>
>> Notice the second sentence of the first line?
>>     "Without the second parameter, ltrim() will strip these characters"
>>
>> That means their is a default set of chars that it uses IF you do not
>> supply a list of chars.
>>
>> On a side note.  If you notice, under the ChangeLog section, the
>> second parameter wasn't added until PHP 4.1.0.
>>
>> What you said was "if they belong to the list they are all removed." 
>> Then what exactly did this this function do before 4.1?
>>
>> I did my requested homework, but I think you need to go back and study
>> that chapter now.
>>
>>   
> 
> This :
> 
> ltrim($line, '0123456789 .');
> 

The reason that this doesn't (maybe) work is because of the situations that were pointed out earlier in this thread.

The following will show you why your example above will not work in all cases

$str = '01. asdf';		Yes
$str = '02.2323 asdf';		No
$str = '03. 2323 asdf';		No
$str = '04.        asdf';	No /* tabs used here */

Success only 25% of the time is not good enough for me, nor should it be for you.

But, if you were to use the regex version I described, it would work for all above examples.

As for calling ltrim() without the second param, let me explain a little more...

ltrim(ltrim(ltrim($line, '0123456789'), '.'));

extracted looks like the following

$line = ltrim($line, '0123456789');   Removes all leading numbers, if present
$line = ltrim($line, '.');            Removes all leading periods, if present (small problem described below)
$line = ltrim($line);                 Removes all leading white space, if present (spaces, tabs, etc...)

There is still one problem with the above method.

What if you come across this.  $str = '04....etc'; or $str = '04.... etc';

if the intended results for both of the last examples is just 'etc', then it will work.

But what if it is '... etc' or '...etc', in this case, neither my regex or the ltrim()'s will catch it.

So, in this case, you cannot program for 100% accuracy, but the closer to 100% I can get the better the customer will feel.

> 
> does remove all those characters doesn't it (as the OP asked and Richard
> suggested on a previous thread). Without calling it more than once as
> far as I tested. That was my point on the first place and sorry if I
> didn't make that clear. On the other hand who ever suggested calling
> ltrim without the second parameter.
> 
> You suggested before something like that:
> 
> ltrim(ltrim(ltrim($line, '0123456789'), '.'))
> 
> 
> when you made a comparison didn't you?
> 
> Sorry if I got that wrong I meant no offense and I still don't.
> 


-- 
Jim Lucas

   "Some men are born to greatness, some achieve greatness,
       and some have greatness thrust upon them."

Twelfth Night, Act II, Scene V
    by William Shakespeare


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux