On Tue, Apr 18, 2023 at 06:10:58AM -0400, Jeff King wrote: > On Thu, Apr 13, 2023 at 07:31:43PM -0400, Taylor Blau wrote: > > > Instead of using `strchr(2)` to locate the first occurrence of the given > > delimiter character, `string_list_split_in_place_multi()` uses > > `strpbrk(2)` to find the first occurrence of *any* character in the given > > delimiter string. > > > > Since the `_multi` variant is a generalization of the original > > implementation, reimplement `string_list_split_in_place()` in terms of > > the more general function by providing a single-character string for the > > list of accepted delimiters. > > I'd imagine that strpbrk() is potentially a lot slower than strchr(). > But I kind of doubt that splitting is such a hot path that it will > matter. If we want to care, I think we could do something like: > > end = delim[1] ? strpbrk(p, delim) : strchr(p, delim[0]); > > It's entirely possible that a half-decent strpbrk() implementation does > this already. I did a little research here, and it looks like both glibc[1] and musl[2] already do this. Those links are for their implementations of strcspn(), but which both implement[^1] strpbrk() as "p += strcspn(p, d)". > So I mention it mostly in case we need to revisit this later. I think > it's OK to ignore for now. In addition to being OK from a performance standpoint I think solving the semantics concern you noted lower down in the thread[3] is fairly straightforward. When in "_multi" mode, I think this boils down to moving past any number of characters in `delim` before each iteration of the loop. And that's doable with "p += strspn(p, delim)". That avoids any behavior change, and more closely matches what strtok() does, so I think it's a better path. Thanks, Taylor [1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcspn.c;hb=glibc-2.37#l35 [2]: https://git.musl-libc.org/cgit/musl/tree/src/string/strcspn.c?h=v1.2.3#n11 [3]: https://lore.kernel.org/git/20230418102320.GB508219@xxxxxxxxxxxxxxxxxxxxxxx/ [^1]: Not entirely. If strcspn() takes you to the end of the string, strpbrk() will return NULL, instead of a pointer to the end of the string.