Re: Catch line indetation

Narcis Garcia <informatica@xxxxxxxxx> · Sat, 29 Oct 2016 09:50:00 +0200



preg_replace('/^( *)/', '${0}', '   <table>...</table>') ==
'   <table>...</table>'


El 29/10/16 a les 05:17, German Geek ha escrit:
> OK, in the interest of science I implemented a test:
> 
> The difference in performance is absolutely minimal:
> 0.07233524322509943 micro seconds on a million iterations (0.795%). The
> regex seemed to be winning with less iterations which I didn't expect, even
> if regexes would be faster.
> 
> *Conclusion: Just use regexes as string handling code can get complicated
> fast and I think more complex regexes outperform complicated string
> handling. Even just adding a character that is not trimmed by ltrim to this
> makes the string handling really hard.*
> 
> Note also, that the regex here might be compiled every time, in which case
> regexes are the clear winner. I think PHP might cache the compiled regex
> though.
> 
> Result:
> php regex.php
> time for string: 9.1010239124298
> time for regex: 9.1733591556549
> 
> Code:
> <?php
> 
> define('LINE_LENGTH', 80);
> define('ITERATIONS', 1000000);
> function getSpaced() {
> $amount = rand(0, LINE_LENGTH);
> $nonSpaceAmount = rand(0, LINE_LENGTH);
> $spaces = str_repeat(' ', $amount);
> $rest = '';
> for ($i = 0; $i < $nonSpaceAmount; ++$i) {
> $rest .= chr(rand(ord('A'), ord('z')));
> }
> return $spaces . $rest;
> }
> $start = microtime(true);
> for ($i = 0; $i < ITERATIONS; ++$i) {
> $string = getSpaced();
> substr($string, 0, strlen($string) - strlen(ltrim($string)));
> }
> echo "time for string: " . (microtime(true) - $start) . "\n";
> 
> $start = microtime(true);
> for ($i = 0; $i < ITERATIONS; ++$i) {
> $string = getSpaced();
> preg_match('/^( *)/', $string, $matches);
> //$matches[1];
> }
> echo "time for regex: " . (microtime(true) - $start) . "\n";
> 
> 
> On Sat, 29 Oct 2016 at 15:37 German Geek <geek.de@xxxxxxxxx> wrote:
> 
>> String functions are very fast. Regexes have to be compiled under the hood
>> to take advantage of their speed. PHP does this behind the scenes. So, if
>> you are only looking for spaces it's going to run faster in my humble
>> opinion.
>>
>> However, I agree that regexes are probably better in any case, because
>> they are much more powerful and for someone who understands them, just as
>> easy to read if not easier, especially in this example.
>>
>> The difference in performance is probably not noticeable, especially not
>> nowadays. Saving developer time is more important and I would use regexes
>> as well.
>>
>> I could be wrong about regexes being slower. It's just what I read
>> somewhere. I guess one would have to do the test on a large input to verify
>> on a case by case basis. As far as I understand regexes have to perform
>> string functions also, which I think are probably more complicated than in
>> this example. Again, something to test.
>>
>> I would want to know, just out of interest though. :-)
>>
>> On Sat, 29 Oct 2016 at 12:40 Ashley Sheridan <ash@xxxxxxxxxxxxxxxxxxxx>
>> wrote:
>>
>>
>>
>> On 28 October 2016 23:33:00 BST, German Geek <geek.de@xxxxxxxxx> wrote:
>>> regex is nicer, because it is less code and you can detect any white
>>> space
>>> etc.
>>>
>>> However!
>>>
>>> substring etc will be faster and more understandable to others who do
>>> not
>>> know much about regexes.
>>>
>>> On Sat, 29 Oct 2016 at 02:21 Christoph M. Becker <cmbecker69@xxxxxx>
>>> wrote:
>>>
>>>> On 28.10.2016 at 14:51, Richard wrote:
>>>>
>>>>>> Date: Friday, October 28, 2016 12:09:31 +0100
>>>>>> From: Ashley Sheridan <ash@xxxxxxxxxxxxxxxxxxxx>
>>>>>>
>>>>>> On 28 October 2016 12:01:16 BST, Narcis Garcia
>>>>>> <informatica@xxxxxxxxx> wrote:
>>>>>>
>>>>>>> Hello, I have a string (I quote here only) as:
>>>>>>>
>>>>>>> '   <table>...</table>'
>>>>>>>
>>>>>>> As you can see there are 3 spaces at the beginning, but it could
>>>>>>> be 0 or
>>>>>>> 4 or any number of spaces.
>>>>>>> How can I get a string with only the initial spaces part?
>>>>>>>
>>>>>>> '   <table>...</table>' -> '   '
>>>>>>> 'hello' -> ''
>>>>>>> ' hello' -> ' '
>>>>>>>
>>>>>>> Thanks.
>>>>>>
>>>>>> Have you tried regular expressions? Something like:
>>>>>>
>>>>>> ^( )*[^ ]
>>>>>>
>>>>>> The first captured match is the number of spaces, from 0 to any
>>>>>> amount. Not the space between the brackets and before the closing
>>>>>> square bracket
>>>>>
>>>>> You need to take into consideration that "whitespace" can be
>>> created
>>>>> by more than the simple "space" (ascii 32) character. A
>>> "[horizontal]
>>>>> tab" (ascii 9) is common, but also look at the top of php trim
>>>>> function documentation:
>>>>>
>>>>>   <http://php.net/manual/en/function.trim.php>
>>>>>
>>>>> to see the characters that it handles as "whitespace".
>>>>
>>>> If general whitespace should be detected with a regexp, \s could be
>>> used.
>>>>
>>>>> While "trim"
>>>>> does the opposite of what you're after, […]
>>>>
>>>> Indeed, so one could do something like
>>>>
>>>>   substr($string, 0, strlen($string) - strlen(ltrim($string)))
>>>>
>>>> I'd prefer a regexp solution, though.
>>>>
>>>> --
>>>> Christoph M. Becker
>>>>
>>>>
>>>> --
>>>> PHP General Mailing List (http://www.php.net/)
>>>> To unsubscribe, visit: http://www.php.net/unsub.php
>>>>
>>>>
>>
>> I really don't think performing two strlen() calls, a substr(), & an
>> ltrim() is going to be faster than a regular expression.
>>
>> I don't think you should avoid regex's because some people don't
>> understand them. It's a very simple regular expression. You wouldn't tell
>> someone to avoid PDO and use mysql_* functions because PDO is too
>> complicated for some people would you?
>> --
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>
>>
> 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php