Re: [PATCH 5/5] banned.h: mark `strtok()`, `strtok_r()` as banned

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 13, 2023 at 06:39:18PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@xxxxxxxxxxxx> writes:
>
> >   - `strtok_r()` forces the caller to maintain an extra string pointer
> >     to pass as its `saveptr` value
> >
> >   - `strtok_r()` also requires that its `saveptr` value be unmodified
> >     between calls.
> >
> >   - `strtok()` (and by extension, `strtok_r()`) is confusing when used
> >     across multiple functions, since the caller is supposed to pass NULL
> >     as its first argument after the first call. This makes it difficult
> >     to determine what string is actually being tokenized without clear
> >     dataflow.
>
> It seems that the only existing users of strtok() are all in
> t/helper/ directory, so I personally do not think it is a huge loss
> if these two are forbidden.  While I do not see why we should use
> strtok(), none of the above sound like sensible reasons to ban
> strtok_r().  At best, they may point out awkwardness of the function
> to make you try finding an alternative that is easier-to-use before
> choosing strtok_r() for your application on a case-by-case basis.

For what it's worth, I could certainly live if we accomplished getting
strtok(2) on the banned list, but left strtok_r(2) un-banned.

TBH, I think that leaving the reenterant version of a banned function as
un-banned is a little awkward, I don't mind it if you don't feel like
the above are sufficient reasons to ban it.

> If your application wants to chomp a string into tokens from left to
> right, inspecting the resulting token one-by-one as it goes until it
> hits a token that satisfies some condition and then terminate
> without wasting cycles on the rest, string_list_split_in_place() is
> a poor choice.  In such a use case, you do not know upfront where in
> the string the sought-after token would be, so you have to split the
> string in full without taking an early exit via maxsplit.  Also, you
> are restricted to a single byte value for the delimiter, and unlike
> strtok[_r](), string_list_split_in_place() does not squash a run of
> delimiter bytes into one inter-token delimiter.

I don't quite agree with this. In practice, you could repeatedly call
`string_list_split_in_place()` with maxsplit of "1", using the tail of
the string list you're splitting into as the string to split. That would
allow you to split tokens one at a time into the string list without
having to split the whole line up front.

That all said, I don't think that we have such a use case in the tree,
at least from my searching for strtok() and
string_list_split_in_place().

It may be that using strtok_r() for such a thing would be less awkward:
not having tried both (and having no examples to reference) I honestly
do not know for certain.

> One gripe I have against use of strtok() is actually not with
> threading but because people often misuse it when strcspn() is what
> they want (i.e. measure the length of the "first token", so that
> they can then xmemdupz() a copy out), forgetting that strtok[_r]()
> is destructive.

Heh, I'm happy to add that to this, if you want ;-).

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux