On Thu, Feb 07, 2019 at 10:22:53AM +0100, Miklos Szeredi wrote: > On Thu, Feb 07, 2019 at 04:00:59AM +0000, Al Viro wrote: > > > So in theory it would be possible to have > > * thread A: sendmsg() has SCM_RIGHTS created and populated, > > complete with file refcount and ->inflight increments implied, > > at which point it gets preempted and loses the timeslice. > > * thread B: gets to run and removes all references > > from descriptor table it shares with thread A. > > * on another CPU we have garbage collector triggered; > > it determines the set of potentially unreachable unix_sock and > > everything in our SCM_RIGHTS _is_ in that set, now that no > > other references remain. > > * on the first CPU, thread A regains the timeslice > > and inserts its SCM_RIGHTS into queue. And it does contain > > references to sockets from the candidate set of running > > garbage collector, confusing the hell out of it. > > Reminds me: long time ago there was a bug report, and based on that I found a > bug in MSG_PEEK handling (not confirmed to have fixed the reported bug). This > fix, although pretty simple, got lost somehow. While unix gc code is in your > head, can you please review and I'll resend through davem? Umm... I think the bug is real (something that looks like an eviction candidate, but actually is referenced from the reachable queue might get peeked via that queue, then have _its_ queue modified via new external reference, all between two passes over that queue, confusing the fuck out of unix_gc()), but I think the fix is an overkill... Am I right assuming that this queue-modifying operation is accept(), removing an embryo unix_sock from the queue of listener and thus hiding SCM_RIGHTS in _its_ queue from scan_children()? Let me think of it a bit, OK? While we are at it, some questions from digging through the current net/unix/garbage.c: 1) is there any need for ->inflight to be atomic? All accesses are under unix_gc_lock, after all... 2) pumping unix_gc on each sodding reference in SCM_RIGHTS (within unix_notinflight()/unix_inflight()) looks atrocious... wouldn't it be better to hold it over that loop? 3) unix_get_socket() probably ought to be static nowadays... 4) I wonder if in scan_inflight()/scan_children() we would be better off with explicit switch (by enum argument) instead of an indirect call. 5) do we really need UNIX_GC_MAYBE_CYCLE? Looks like the only real use is in inc_inflight_move_tail(), and AFAICS it could bloody well have been u->inflight++; if (u->inflight == 1) // just got from zero to non-zero list_move_tail(&u->link, &gc_candidates); The logics there is "we'd found a reference to something that still was a candidate for eviction in a reachable SCM_RIGHTS, so it's actually reachable and needs to be scanned (and removed from the set of candidates); move to the end of list, so that the main loop gets around to it". If it *was* past the cursor in the list, there's no need to move it; if we got past it, it must've had zero ->inflight (or we would've removed it from the set back when we got past it). Note that it's only called if UNIX_GC_CANDIDATE is set (i.e. if it's in the initial candidate set), so for this one ->inflight is guaranteed to mean the number of SCM_RIGHTS refs from outside of the current candidate set... 6) unix_get_socket() looks like it might benefit from another FMODE bit; not sure where it's best set, though - the obvious way would be SOCK_GC_CARES in sock->flags, set by e.g. unix_create1(), with sock_alloc_file() propagating it into file->f_mode. Then unix_get_socket() would be able to bugger off with NULL for most of the references in SCM_RIGHTS, without looking into inode... Comments? IIRC, you'd done the last serious round of rewriting unix_gc() and friends...