Re: [PATCH v4 6/6] regex.3: Destandardeseify Match offsets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 4/20/23 17:05, наб wrote:
> Hi!
> 
> On Thu, Apr 20, 2023 at 04:10:04PM +0200, Alejandro Colomar wrote:
>> On 4/20/23 15:02, наб wrote:
>>> --- a/man3/regex.3
>>> +++ b/man3/regex.3
>>> @@ -188,37 +188,34 @@ This flag is a BSD extension, not present in POSIX.
>>>  .SS Match offsets
>>>  Unless
>>>  .B REG_NOSUB
>>> -was set for the compilation of the pattern buffer, it is possible to
>>> -obtain match addressing information.
>>> -.I pmatch
>>> -must be dimensioned to have at least
>>> -.I nmatch
>>> -elements.
>>> -These are filled in by
>>> +was passed to
>>> +.BR regcomp (),
>>> +it is possible to
>>> +obtain the locations of matches within
>>> +.IR string :
>>>  .BR regexec ()
>>> -with substring match addresses.
>>> -The offsets of the subexpression starting at the
>>> -.IR i th
>>> -open parenthesis are stored in
>>> -.IR pmatch[i] .
>>> -The entire regular expression's match addresses are stored in
>>> -.IR pmatch[0] .
>>> -(Note that to return the offsets of
>>> -.I N
>>> -subexpression matches,
>>> +fills
>>>  .I nmatch
>>> -must be at least
>>> -.IR N+1 .)
>>> -Any unused structure elements will contain the value \-1.
>>> +elements of
>>> +.I pmatch
>>> +with results:
>>> +.I pmatch[0]
>>> +corresponds to the entire match,
>> I still don't understand this.  Does REG_NOSUB also affect pmatch[0]?
>> I would have expected that it would only affect *sub*matches, that is, [>0].
> 
> Let's consult the manual:
>   REG_NOSUB  Do not report position of matches. [...]
>   REG_NOSUB  Compile for matching that need only report success or
>              failure, not what was matched.                    (4.4BSD)
> and POSIX:
>   REG_NOSUB  Report only success or fail in regexec().
>   REG_NOSUB  Report only success/fail in regexec( ).
> (yes; the two times it describes it, it's written differently).
> 
> POSIX says it better I think.
> 
> And, indeed:
> 	$ cat a.c
> 	#include <regex.h>
> 	#include <stdio.h>
> 	int main(int c, char ** v) {
> 		regex_t r;
> 		regcomp(&r, v[1], 0);
> 		regmatch_t dt = {0, 3};
> 		printf("%d\n", regexec(&r, v[2], 1, &dt, REG_STARTEND));
> 		printf("%d, %d\n", (int)dt.rm_so, (int)dt.rm_eo);
> 	}
> 
> 	$ cc a.c -oac
> 	$ ./ac 'c$' 'abcdef'
> 	0
> 	2, 3
> 
> 	$ sed 's/0)/REG_NOSUB)/' a.c | cc -xc - -oac
> 	$ ./ac 'c$' 'abcdef'
> 	0
> 	0, 3
> 

I like this example, and the quotes from POSIX.  I'll link to your
message in the commit log.

> 
> ...and I've just realised why you're asking ‒ I think you're reading too
> much (and ahistorically) into the "SUB" bit;

[...]

> Actually, let's consult POSIX.2 (Draft 11.2):

[...]

>   609  If the REG_NOSUB flag was not set in cflags, then regcomp() shall set re_nsub to
>   610  the number of parenthesized subexpressions [delimited by \( \) in basic regular
>   611  expressions or ( ) in extended regular expressions] found in pattern.
> both as present-day.

[...]

> It also allows an application to request an arbitrary number of sub-
>   810  strings from a regular expression. (Previous versions reported only ten sub-
>   811  strings.) The number of subexpressions in the regular expression is reported in
>   812  re_nsub in preg.

[...]

> 
> So: yes, there was a substitution interface that got cut.
> The name is actually a hold-over from
> "don't allocate for ten subexpressions in regex_t".

So, the name indeed seems to come from "subexpressions", which confirms
that it's just confusing as hell.

> 
> I think changing our description to
>   REG_NOSUB  Only report overall success. regexec() will only use pmatch
>              for REG_STARTEND, and ignore nmatch.
> may make that more obvious.

Yeah, this, and further the version in v8, makes the behavior clear, even
if the name is brain-damaged (but there's nothing we can do about it :/).

Cheers,
Alex

> 
> Best,
> наб

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux