Re: [PATCH 1/5] string-list: introduce `string_list_split_in_place_multi()`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 18, 2023 at 06:10:58AM -0400, Jeff King wrote:
> On Thu, Apr 13, 2023 at 07:31:43PM -0400, Taylor Blau wrote:
>
> > Instead of using `strchr(2)` to locate the first occurrence of the given
> > delimiter character, `string_list_split_in_place_multi()` uses
> > `strpbrk(2)` to find the first occurrence of *any* character in the given
> > delimiter string.
> >
> > Since the `_multi` variant is a generalization of the original
> > implementation, reimplement `string_list_split_in_place()` in terms of
> > the more general function by providing a single-character string for the
> > list of accepted delimiters.
>
> I'd imagine that strpbrk() is potentially a lot slower than strchr().
> But I kind of doubt that splitting is such a hot path that it will
> matter. If we want to care, I think we could do something like:
>
>   end = delim[1] ? strpbrk(p, delim) : strchr(p, delim[0]);
>
> It's entirely possible that a half-decent strpbrk() implementation does
> this already.

I did a little research here, and it looks like both glibc[1] and
musl[2] already do this. Those links are for their implementations of
strcspn(), but which both implement[^1] strpbrk() as "p += strcspn(p,
d)".

> So I mention it mostly in case we need to revisit this later. I think
> it's OK to ignore for now.

In addition to being OK from a performance standpoint I think solving
the semantics concern you noted lower down in the thread[3] is fairly
straightforward.

When in "_multi" mode, I think this boils down to moving past any number
of characters in `delim` before each iteration of the loop. And that's
doable with "p += strspn(p, delim)".

That avoids any behavior change, and more closely matches what strtok()
does, so I think it's a better path.

Thanks,
Taylor

[1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcspn.c;hb=glibc-2.37#l35
[2]: https://git.musl-libc.org/cgit/musl/tree/src/string/strcspn.c?h=v1.2.3#n11
[3]: https://lore.kernel.org/git/20230418102320.GB508219@xxxxxxxxxxxxxxxxxxxxxxx/

[^1]: Not entirely. If strcspn() takes you to the end of the string,
  strpbrk() will return NULL, instead of a pointer to the end of the
  string.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux