On Wed, Dec 06, 2023 at 10:02:04PM +0100, Jann Horn wrote: > On Wed, Dec 6, 2023 at 9:42 PM Phil Sutter <phil@xxxxxx> wrote: > > > > On Wed, Dec 06, 2023 at 05:28:44PM +0100, Jann Horn wrote: > > > On Tue, Dec 5, 2023 at 10:40 PM Phil Sutter <phil@xxxxxx> wrote: > > > > On Tue, Dec 05, 2023 at 06:08:29PM +0100, Jann Horn wrote: > > > > > On Tue, Dec 5, 2023 at 5:40 PM Jann Horn <jannh@xxxxxxxxxx> wrote: > > > > > > > > > > > > Hi! > > > > > > > > > > > > I think this code is racy, but testing that seems like a pain... > > > > > > > > > > > > owner_mt() in xt_owner runs in context of a NF_INET_LOCAL_OUT or > > > > > > NF_INET_POST_ROUTING hook. It first checks that sk->sk_socket is > > > > > > non-NULL, then checks that sk->sk_socket->file is non-NULL, then > > > > > > accesses the ->f_cred of that file. > > > > > > > > > > > > I don't see anything that protects this against a concurrent > > > > > > sock_orphan(), which NULLs out the sk->sk_socket pointer, if we're in > > > > > > > > > > Ah, and all the other users of ->sk_socket in net/netfilter/ do it > > > > > under the sk_callback_lock... so I guess the fix would be to add the > > > > > same in owner_mt? > > > > > > > > Sounds reasonable, although I wonder how likely a socket is to > > > > orphan while netfilter is processing a packet it just sent. > > > > > > > > How about the attached patch? Not sure what hash to put into a Fixes: > > > > tag given this is a day 1 bug and ipt_owner/ip6t_owner predate git. > > > > > > Looks mostly reasonable to me; though I guess it's a bit weird to have > > > two separate bailout paths for checking whether sk->sk_socket is NULL, > > > where the first check can race, and the second check uses different > > > logic for determining the return value; I don't know whether that > > > actually matters semantically. But I'm not sure how to make it look > > > nicer either. > > > > I find the code pretty confusing since it combines three matches (socket > > UID, socket GID and socket existence) via binary ops. The second bail > > disregards socket existence bits, I assumed it was deliberate and thus > > decided to leave the first part as-is. > > > > > I guess you could add a READ_ONCE() around the first read to signal > > > that that's a potentially racy read, but I don't feel strongly about > > > that. > > > > Is this just annotation or do you see a practical effect of using > > READ_ONCE() there? > > I mostly just meant that as an annotation. My understanding is that in > theory, racy reads can cause the compiler to do some terrible things > to your code (https://lore.kernel.org/all/CAG48ez2nFks+yN1Kp4TZisso+rjvv_4UW0FTo8iFUd4Qyq1qDw@xxxxxxxxxxxxxx/), Thanks for the pointer, this was an educational read! > but that's almost certainly not going to happen here. At least it's not a switch on a value in user-controlled memory. ;) > (Well, I guess doing a READ_ONCE() at one side without doing > WRITE_ONCE() on the other side is also unclean...) For the annotation aspect it won't matter. Though since it will merely improve reliability of that check in the given corner-case which is an unreliable situation in the first place, I'd just leave it alone and hope for the code to be replaced by the one in nft_meta.c eventually. Thanks, Phil