Re: [PATCH 2/2] regex.3: improve REG_STARTEND

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

On 4/19/23 23:20, наб wrote:
> Hi!
> 
> On Wed, Apr 19, 2023 at 10:23:29PM +0200, Alejandro Colomar wrote:
>> On 4/19/23 19:48, наб wrote:
>>> diff --git a/man3/regex.3 b/man3/regex.3
>>> index d54d6024c..2c8b87aca 100644
>>> --- a/man3/regex.3
>>> +++ b/man3/regex.3
>>> @@ -141,23 +141,20 @@ compilation flag
>>>  above).
>>>  .TP
>>>  .B REG_STARTEND
>>> -Use
>>> -.I pmatch[0]
>>> -on the input string, starting at byte
>>> -.I pmatch[0].rm_so
>>> -and ending before byte
>>> -.IR pmatch[0].rm_eo .
>>> +Match
>>> +.RI [ string " + " pmatch->rm_so ", " string " + " pmatch->rm_eo )
>>> +instead of
>>> +.RI [ string ", " string " + \fBstrlen\fP(" string )).
>> Hmmm, I like this!
>>
>> Let's see if I understand it.  pmatch[] is normally
>> [[gnu::access(write_only, 4, 3)]]
>> but if ((.eflags & REG_STARTEND) != 0) it's [1] and
>> [[gnu::access(read_write, 4)]]?
> I fucked the ternary in my previous mail I think, soz;
> I don't know if it's gnu::anything, but you could model it as
> {
> 	if(eflags & REG_STARTEND)
> 		read(pmatch, 1);
> 
> 	if(!(preg->flags & REG_NOSUB))  // as "set" in regcomp()
> 		write(pmatch, nmatch);
> }
> 
> I.e. pmatch[nmatch] must be a writable array, unless REG_NOSUB,
> and also, additively, *pmatch must be readable if REG_STARTEND.

Ahh, now it's clear to me (I think).  :)

> 
>>>  This allows matching embedded NUL bytes
>>>  and avoids a
>>>  .BR strlen (3)
>>> -on large strings.
>>> -It does not use
>>> +on known-length strings.
>>>  .I nmatch
>>> -on input, and does not change
>>> -.B REG_NOTBOL
>>> -or
>>> -.B REG_NEWLINE
>>> -processing.
>>> +is not consulted for this purpose.
>>> +If any matches are returned, they're relative to
>>> +.IR string ,
>>> +not
>>> +.IR string " + " pmatch->rm_so .
>> How are such matches returned?  In pmatch[>0]?  Or how?
> In the usual way in pmatch[0..nmatch].
> 
> I guess the "nmatch isn't taken into account" thing is confusing,
> because REG_STARTEND just adds a read. regexec() can be modelled as
> {
> 	const char * start, * end;
> 	if(eflags & REG_STARTEND) {
> 		start = string + pmatch->rm_so;
> 		end   = string + pmatch->rm_eo;
> 	} else {
> 		start = string;
> 		end   = string + strlen(string);
> 	}
> 	
> 	// match stuff in [start, end)
> }
> 
> And that's the /only/ effect REG_STARTEND has
> (+ matches are returned relative to string, not to start,
>    but that's consistent, and they just got decoupled;
>    it bears noting it there since it's not what I expected to happen).
> 
> I'll sleep on this and post something I hate less tomorrow.

Sure; good night!

Best,
Alex

> 
> Best,

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux