RE: Parsing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks peter for the explanation, but what I need to fix my problem.
Im getting this error code 

Notice: iconv(): Detected an illegal character in input string in
/var/www/html/rssfeed/sahafah.php on line 35

When I convert from utf-8 to cp1256
$rss2 = iconv("UTF-8", 'CP1256//TRANSLIT', $rss);


> -----Original Message-----
> From: Peter West [mailto:lists@xxxxxxxxx]
> Sent: Wednesday, February 25, 2015 2:56 PM
> To: Maciek Sokolewicz; PHP General
> Subject: Re:  Parsing
> 
> > So, what does (.*?) mean? Well, simply said "any character, occuring 0
or
> more times" occuring 0 or 1 times.
> 
> I don't think so.  ((.*)?) would mean that, but in (.*?), the '?' means
"make
> the preceding pattern non-greedy; that is, make it match the minimum
> number of times. And as the minimum number of matches of (.*) is zero, it
> ends up meaning 'match no character at all. So it will always be true,
> wherever it occurs in a match string.
> 
> For instance,
> 
> $ php -a
> Interactive shell
> 
> php > $test = "aabbcc";
> php > $re = '/.+?(bb?).*/';
> php > preg_match($re, $test, $match);
> php > print_r($match);
> Array
> (
>     [0] => aabbcc
>     [1] => bb
> )
>   Note here that the initial pattern piece '.+?' is limited to the minimum
> match.
>   The minimum match is a single character, but that is overruled by the
> attempt to match
>   capturing sub-expression '(bb?)' so it in fact matches 'aa'.  Note that
in
>   this regexp, (bb?) means "...a 'b' char followed by zero or one 'b'
chars."
>   Now change that initial sub-expression.
> php > $re = '/.*?(bb?).*/';
> php > preg_match($re, $test, $match);
> php > print_r($match);
> Array
> (
>     [0] => aabbcc
>     [1] => bb
> )
>   The minimum match here is no characters, constrained to the minimum.
> But again,
>   the minimum match must be extended to accommodate '(bb?)'.
>   Now remove the minimising constraint.
> php > $re = '/.*(bb?).*/';
> php > preg_match($re, $test, $match);
> php > print_r($match);
> Array
> (
>     [0] => aabbcc
>     [1] => b
> )
>   Only one 'b'!  Which 'b' is matched?  It's the second 'b'.  The minimum
match
>   for (bb?) is a single 'b' followed by zero 'b's; so the second 'b'
satisfies
>   the capturing expression, and the now-greedy initial subexpression can
>   gobble up all of the character to that second 'b'.
> 
>   Don't believe me?
> php > $re = '/(.*)(bb?).*/';
> php > preg_match($re, $test, $match);
> php > print_r($match);
> Array
> (
>     [0] => aabbcc
>     [1] => aab
>     [2] => b
> )
>   Let's back up.
> php > $re = '/(.*?)(bb?).*/';
> php > preg_match($re, $test, $match);
> php > print_r($match);
> Array
> (
>     [0] => aabbcc
>     [1] => aa
>     [2] => bb
> )
>   As before, with a non-greedy initial sub-expression,
>   except that we now capture that initial sub-expression.
>   (bb?) means "...a 'b' followed by zero or one 'b's, greedily.
>   Can we force that to be non-greedy?
> php > $re = '/(.*?)(bb??)(.*)/';
> php > preg_match($re, $test, $match);
> php > print_r($match);
> Array
> (
>     [0] => aabbcc
>     [1] => aa
>     [2] => b
>     [3] => bcc
> )
>   Yes we can, by appending a moderating '?' which curbs the
>   appetite of the capturing sub-expression: (bb??)
> 
> Peter West
> "...and behold, something greater than Jonah is here."
> 
> > On 23 Feb 2015, at 10:48 pm, Maciek Sokolewicz
> <maciek.sokolewicz@xxxxxxxxx> wrote:
> >
> > Secondly, the above two regexp rules are slightly bloated. What they
> actually mean is:
> > ( = start new catchable pattern
> > . = any character
> > * = 0 or more of the previous pattern
> > ? = 0 or 1 of the previous pattern
> > ) = end catchable pattern
> > \s = any whitespace character
> >
> > So, what does (.*?) mean? Well, simply said "any character, occuring 0
or
> more times" occuring 0 or 1 times. But since the any character pattern
> already occurs 0 or more times, the pattern as a whole will either be
matched
> (1 time) or not (0 times). Making the ? metacharacter useless. Now if it
were
> (.+?) then it would state "one or more of any character, with the entire
> pattern optional.
> > So in practice, the following patterns are equal in what they
> > represent: (.*?), (.*), (.+?)
> >
> 
> 
> --
> PHP General Mailing List (http://www.php.net/) To unsubscribe, visit:
> http://www.php.net/unsub.php



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php





[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux