On Thu, Feb 7, 2019 at 2:31 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > On Thu, Feb 07, 2019 at 10:22:53AM +0100, Miklos Szeredi wrote: > > On Thu, Feb 07, 2019 at 04:00:59AM +0000, Al Viro wrote: > > > > > So in theory it would be possible to have > > > * thread A: sendmsg() has SCM_RIGHTS created and populated, > > > complete with file refcount and ->inflight increments implied, > > > at which point it gets preempted and loses the timeslice. > > > * thread B: gets to run and removes all references > > > from descriptor table it shares with thread A. > > > * on another CPU we have garbage collector triggered; > > > it determines the set of potentially unreachable unix_sock and > > > everything in our SCM_RIGHTS _is_ in that set, now that no > > > other references remain. > > > * on the first CPU, thread A regains the timeslice > > > and inserts its SCM_RIGHTS into queue. And it does contain > > > references to sockets from the candidate set of running > > > garbage collector, confusing the hell out of it. > > > > Reminds me: long time ago there was a bug report, and based on that I found a > > bug in MSG_PEEK handling (not confirmed to have fixed the reported bug). This > > fix, although pretty simple, got lost somehow. While unix gc code is in your > > head, can you please review and I'll resend through davem? > > Umm... I think the bug is real (something that looks like an eviction > candidate, but actually is referenced from the reachable queue might > get peeked via that queue, then have _its_ queue modified via new > external reference, all between two passes over that queue, confusing the > fuck out of unix_gc()), but I think the fix is an overkill... > > Am I right assuming that this queue-modifying operation is accept(), removing > an embryo unix_sock from the queue of listener and thus hiding SCM_RIGHTS in > _its_ queue from scan_children()? Hmm... How about just receiving an SCM_RIGHTS socket (which was a candidate) from the queue of the peeked socket? > Let me think of it a bit, OK? While we are at it, some questions from digging > through the current net/unix/garbage.c: > 1) is there any need for ->inflight to be atomic? All accesses are under > unix_gc_lock, after all... Seems so. Probably historic. > 2) pumping unix_gc on each sodding reference in SCM_RIGHTS (within > unix_notinflight()/unix_inflight()) looks atrocious... wouldn't it be better to > hold it over that loop? Sure. I guess SCM_RIGHTS are not too performance sensitive, but that's a trivial cleanup... > 3) unix_get_socket() probably ought to be static nowadays... > 4) I wonder if in scan_inflight()/scan_children() we would be better > off with explicit switch (by enum argument) instead of an indirect call. Right. > 5) do we really need UNIX_GC_MAYBE_CYCLE? Looks like the only > real use is in inc_inflight_move_tail(), and AFAICS it could bloody well > have been > u->inflight++; > if (u->inflight == 1) // just got from zero to non-zero > list_move_tail(&u->link, &gc_candidates); > The logics there is "we'd found a reference to something that still was > a candidate for eviction in a reachable SCM_RIGHTS, so it's actually > reachable and needs to be scanned (and removed from the set of candidates); > move to the end of list, so that the main loop gets around to it". > If it *was* past the cursor in the list, there's no need to move it; if > we got past it, it must've had zero ->inflight (or we would've removed > it from the set back when we got past it). Note that it's only called if > UNIX_GC_CANDIDATE is set (i.e. if it's in the initial candidate set), > so for this one ->inflight is guaranteed to mean the number of SCM_RIGHTS > refs from outside of the current candidate set... Right, makes sense. > 6) unix_get_socket() looks like it might benefit from another > FMODE bit; not sure where it's best set, though - the obvious way would > be SOCK_GC_CARES in sock->flags, set by e.g. unix_create1(), with > sock_alloc_file() propagating it into file->f_mode. Then unix_get_socket() > would be able to bugger off with NULL for most of the references in > SCM_RIGHTS, without looking into inode... Yep, sounds good. Thanks, Miklos