On Tue, Sep 21, 2021 at 02:07:08PM +0200, Ævar Arnfjörð Bjarmason wrote: > This whole thing looks good to me. I only found a small whitespace nit > in one of the patches. Did you consider following-up by having this code > take const char*/const size_t pairs. E.g. starting with something like > the below. I do generally find ptr/len pairs to be easier to read, but they also make it really easy to introduce subtle bugs. E.g., if you consume part of a buffer, you have to tweak both the ptr and the len. So the current: while (word_char(bol[-1]) && bol < eol) bol++; has to become: while (word_char(bol[-1] && len > 0) { bol++; len--; } So I'd be hesitant to churn battle-tested code in such a way for what I consider to be a pretty minor benefit. I did notice the ugly use of "unsigned long" here in a few places (rather than size_t). I do think it is worth fixing, but it seemed a little too far to try to cram into this series (it's obviously touching the same lines, but it's quite orthogonal semantically). The other hesitation I had is that the source of this "unsigned long" pattern is almost certainly the object code (which is much more important to convert, as it blocks people from having >4GB objects on Windows). So we might want to just wait for a larger conversion there. OTOH, I don't think there is any downside to a partial conversion here in the meantime (because size_t will always be at least as long as "unsigned long" in practice). > -static int pcre2match(struct grep_pat *p, const char *line, const char *eol, > +static int pcre2match(struct grep_pat *p, const char *line, const size_t len, > regmatch_t *match, int eflags) > { > int ret, flags = 0; > @@ -448,11 +448,11 @@ static int pcre2match(struct grep_pat *p, const char *line, const char *eol, > > if (p->pcre2_jit_on) > ret = pcre2_jit_match(p->pcre2_pattern, (unsigned char *)line, > - eol - line, 0, flags, p->pcre2_match_data, > + len, 0, flags, p->pcre2_match_data, > NULL); > else > ret = pcre2_match(p->pcre2_pattern, (unsigned char *)line, > - eol - line, 0, flags, p->pcre2_match_data, > + len, 0, flags, p->pcre2_match_data, > NULL); Not related to your point, but these casts are funny now. They are meant to cast to "unsigned char" pointers to match pcre's signature, but now they are casting away const-ness, too. That might be worth fixing as part of this series. Though should they really be casting to PCRE2_SPTR? The types are opaque in their API because of the weird multi-width thing, though I find it hard to imagine us ever using the wider versions of the library. -Peff