Re: removing text from a string

Thodoris <tgol@xxxxxxxxxx> · Thu, 06 Nov 2008 20:33:33 +0200

Thodoris wrote:

Boyd, Todd M. wrote:

-----Original Message-----
From: Ashley Sheridan [mailto:ash@xxxxxxxxxxxxxxxxxxxx]
Sent: Tuesday, November 04, 2008 1:40 PM
To: Adam Williams
Cc: PHP General list
Subject: Re:  removing text from a string

On Tue, 2008-11-04 at 08:04 -0600, Adam Williams wrote:

I have a file that looks like:

1. Some Text here
2. Another Line of Text
3. Yet another line of text
340. All the way to number 340

And I want to remove the Number, period, and blank space at the

begining

of each line.  How can I accomplish this?

Opening the file to modify it is easy, I'm just lost at how to

remove

the text.:

<?php
$filename = "results.txt";

$fp = fopen($filename, "r") or die ("Couldn't open $filename");
if ($fp)
{
while (!feof($fp))
        {
        $thedata = fgets($fp);
        //Do something to remove the "1. "
        //print the modified line and \n
        }
fclose($fp);
}
?>

I'd go with a regular expression any day for something like this.
Something like:

"/$[0-9]{1,3}\.\ .*^/g"

should do what you need. Note the space before the last period.

That would only work for files with 1-999 lines, and will wind up
matching the entire line (since you used $ and ^ and a greedy .*
inbetween... also... $ is "end-of-line" and ^ is
"beginning-of-line" :))
rather than just the "line number" part. I would stick with my
originally-posted regex ("/^\d+\.\s/"), but I would modify yours like
this if I were to use it instead:

"/^[0-9]+\.\ (.*)$/" (What was the "g" modifier for, anyway?)

Then, you could grab the capture group made with (.*) and use it as
the
"clean" data. (It would be group 1 in the match results and "$1" in a
preg_replace() call, I believe. Group 0 should be the entire match.)

Todd Boyd
Web Programmer

Personally, I would go this route if you wanted to stick with a regex.

<?php

$lines[] = '01. asdf';
$lines[] = '02. 323 asdf';
$lines[] = '03.2323 asdf';
$lines[] = '04. asdf 23';
$lines[] = '05.        asdf'; /* tabs used here */
$lines[] = '06. asdf';

foreach ( $lines AS $line ) {
    echo preg_replace('/^[0-9]+\.\s*/', '', $line), "\n";
}

?>

This takes care of all possible issues related to the char after the
first period.  Maybe it is there maybe not.

Could be that it is a tab and not a space.  Could even be multiple
tabs or spaces.

There it goes again.

Every time someone asks a simple question (like the kind it's solved
with a simple trim, ltrim or rtrim) the discussion about which is the
best regular expression for this problem, makes a thread get "elephant"
sized :-) .

I love this list!!

Your not going to be able to get it with any of the xtrim()
functions.  you would end up with various nested ltrim() calls that,
IMO, would be a
nightmare to manage.

So, a top to bottom comparison here

If $line is this:
$line = '01. asdf';

And you use either one of these:
A) ltrim(ltrim(ltrim($line, '0123456789'), '.'));
B) preg_replace('/^[0-9]+\.\s*/', '', $line);

Which do you prefer?

A's Pros:
    Not a regex
A's Cons:
    A little slower then B
    multiple function calls

B's Pros:
    Slightly faster then A
    Single Function call
B's Cons:
    Regex

You should really check the manual again Jim. AFAIK ltrim doesn't remove
a single character but as long they belong to the list they are all
removed you just do:

ltrim($line, '0123456789')

and this does the job perfectly. So perhaps you need to reconsider your
thoughts on this.

Maybe instead of saying "AFAIK", you should go and check it yourself.  But obviously, since you didn't care to do it the first time around, I will
supply the relevant parts for you and the list archive.  And I quote:

Reference: http://us3.php.net/ltrim

Under "Returned Values" section...

Return Values

This function returns a string with whitespace stripped from the beginning of str . Without the second parameter, ltrim() will strip these characters:

    * " " (ASCII 32 (0x20)), an ordinary space.
    * "\t" (ASCII 9 (0x09)), a tab.
    * "\n" (ASCII 10 (0x0A)), a new line (line feed).
    * "\r" (ASCII 13 (0x0D)), a carriage return.
    * "\0" (ASCII 0 (0x00)), the NUL-byte.
    * "\x0B" (ASCII 11 (0x0B)), a vertical tab.

Notice the second sentence of the first line?
	"Without the second parameter, ltrim() will strip these characters"

That means their is a default set of chars that it uses IF you do not supply a list of chars.

On a side note.  If you notice, under the ChangeLog section, the second parameter wasn't added until PHP 4.1.0.

What you said was "if they belong to the list they are all removed."  Then what exactly did this this function do before 4.1?

I did my requested homework, but I think you need to go back and study that chapter now.

This :

ltrim($line, '0123456789 .');

does remove all those characters doesn't it (as the OP asked and Richard 
suggested on a previous thread). Without calling it more than once as 
far as I tested. That was my point on the first place and sorry if I 
didn't make that clear. On the other hand who ever suggested calling 
ltrim without the second parameter.

You suggested before something like that:

ltrim(ltrim(ltrim($line, '0123456789'), '.'))

when you made a comparison didn't you?

Sorry if I got that wrong I meant no offense and I still don't.

--
Thodoris