Re: [PATCH 13/18] io_uring: add file set registration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 7, 2019 at 2:31 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>
> On Thu, Feb 07, 2019 at 10:22:53AM +0100, Miklos Szeredi wrote:
> > On Thu, Feb 07, 2019 at 04:00:59AM +0000, Al Viro wrote:
> >
> > > So in theory it would be possible to have
> > >     * thread A: sendmsg() has SCM_RIGHTS created and populated,
> > > complete with file refcount and ->inflight increments implied,
> > > at which point it gets preempted and loses the timeslice.
> > >     * thread B: gets to run and removes all references
> > > from descriptor table it shares with thread A.
> > >     * on another CPU we have garbage collector triggered;
> > > it determines the set of potentially unreachable unix_sock and
> > > everything in our SCM_RIGHTS _is_ in that set, now that no
> > > other references remain.
> > >     * on the first CPU, thread A regains the timeslice
> > > and inserts its SCM_RIGHTS into queue.  And it does contain
> > > references to sockets from the candidate set of running
> > > garbage collector, confusing the hell out of it.
> >
> > Reminds me: long time ago there was a bug report, and based on that I found a
> > bug in MSG_PEEK handling (not confirmed to have fixed the reported bug).  This
> > fix, although pretty simple, got lost somehow.  While unix gc code is in your
> > head, can you please review and I'll resend through davem?
>
> Umm...  I think the bug is real (something that looks like an eviction
> candidate, but actually is referenced from the reachable queue might
> get peeked via that queue, then have _its_ queue modified via new
> external reference, all between two passes over that queue, confusing the
> fuck out of unix_gc()), but I think the fix is an overkill...
>
> Am I right assuming that this queue-modifying operation is accept(), removing
> an embryo unix_sock from the queue of listener and thus hiding SCM_RIGHTS in
> _its_ queue from scan_children()?

Hmm... How about just receiving an SCM_RIGHTS socket (which was a
candidate) from the queue of the peeked socket?

> Let me think of it a bit, OK?  While we are at it, some questions from digging
> through the current net/unix/garbage.c:
>         1) is there any need for ->inflight to be atomic?  All accesses are under
> unix_gc_lock, after all...

Seems so.  Probably historic.

>         2) pumping unix_gc on each sodding reference in SCM_RIGHTS (within
> unix_notinflight()/unix_inflight()) looks atrocious... wouldn't it be better to
> hold it over that loop?

Sure.  I guess SCM_RIGHTS are not too performance sensitive, but
that's a trivial cleanup...


>         3) unix_get_socket() probably ought to be static nowadays...
>         4) I wonder if in scan_inflight()/scan_children() we would be better
> off with explicit switch (by enum argument) instead of an indirect call.

Right.

>         5) do we really need UNIX_GC_MAYBE_CYCLE?  Looks like the only
> real use is in inc_inflight_move_tail(), and AFAICS it could bloody well
> have been
>         u->inflight++;
>         if (u->inflight == 1)   // just got from zero to non-zero
>                 list_move_tail(&u->link, &gc_candidates);
> The logics there is "we'd found a reference to something that still was
> a candidate for eviction in a reachable SCM_RIGHTS, so it's actually
> reachable and needs to be scanned (and removed from the set of candidates);
> move to the end of list, so that the main loop gets around to it".
> If it *was* past the cursor in the list, there's no need to move it; if
> we got past it, it must've had zero ->inflight (or we would've removed
> it from the set back when we got past it).  Note that it's only called if
> UNIX_GC_CANDIDATE is set (i.e. if it's in the initial candidate set),
> so for this one ->inflight is guaranteed to mean the number of SCM_RIGHTS
> refs from outside of the current candidate set...

Right, makes sense.

>         6) unix_get_socket() looks like it might benefit from another
> FMODE bit; not sure where it's best set, though - the obvious way would
> be SOCK_GC_CARES in sock->flags, set by e.g. unix_create1(), with
> sock_alloc_file() propagating it into file->f_mode.  Then unix_get_socket()
> would be able to bugger off with NULL for most of the references in
> SCM_RIGHTS, without looking into inode...

Yep, sounds good.

Thanks,
Miklos



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux