Re: [PATCH 2/2] regex.3: improve REG_STARTEND

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

On Wed, Apr 19, 2023 at 10:23:29PM +0200, Alejandro Colomar wrote:
> On 4/19/23 19:48, наб wrote:
> > diff --git a/man3/regex.3 b/man3/regex.3
> > index d54d6024c..2c8b87aca 100644
> > --- a/man3/regex.3
> > +++ b/man3/regex.3
> > @@ -141,23 +141,20 @@ compilation flag
> >  above).
> >  .TP
> >  .B REG_STARTEND
> > -Use
> > -.I pmatch[0]
> > -on the input string, starting at byte
> > -.I pmatch[0].rm_so
> > -and ending before byte
> > -.IR pmatch[0].rm_eo .
> > +Match
> > +.RI [ string " + " pmatch->rm_so ", " string " + " pmatch->rm_eo )
> > +instead of
> > +.RI [ string ", " string " + \fBstrlen\fP(" string )).
> Hmmm, I like this!
> 
> Let's see if I understand it.  pmatch[] is normally
> [[gnu::access(write_only, 4, 3)]]
> but if ((.eflags & REG_STARTEND) != 0) it's [1] and
> [[gnu::access(read_write, 4)]]?
I fucked the ternary in my previous mail I think, soz;
I don't know if it's gnu::anything, but you could model it as
{
	if(eflags & REG_STARTEND)
		read(pmatch, 1);

	if(!(preg->flags & REG_NOSUB))  // as "set" in regcomp()
		write(pmatch, nmatch);
}

I.e. pmatch[nmatch] must be a writable array, unless REG_NOSUB,
and also, additively, *pmatch must be readable if REG_STARTEND.

> >  This allows matching embedded NUL bytes
> >  and avoids a
> >  .BR strlen (3)
> > -on large strings.
> > -It does not use
> > +on known-length strings.
> >  .I nmatch
> > -on input, and does not change
> > -.B REG_NOTBOL
> > -or
> > -.B REG_NEWLINE
> > -processing.
> > +is not consulted for this purpose.
> > +If any matches are returned, they're relative to
> > +.IR string ,
> > +not
> > +.IR string " + " pmatch->rm_so .
> How are such matches returned?  In pmatch[>0]?  Or how?
In the usual way in pmatch[0..nmatch].

I guess the "nmatch isn't taken into account" thing is confusing,
because REG_STARTEND just adds a read. regexec() can be modelled as
{
	const char * start, * end;
	if(eflags & REG_STARTEND) {
		start = string + pmatch->rm_so;
		end   = string + pmatch->rm_eo;
	} else {
		start = string;
		end   = string + strlen(string);
	}
	
	// match stuff in [start, end)
}

And that's the /only/ effect REG_STARTEND has
(+ matches are returned relative to string, not to start,
   but that's consistent, and they just got decoupled;
   it bears noting it there since it's not what I expected to happen).

I'll sleep on this and post something I hate less tomorrow.

Best,

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux