Re: [V9fs-developer] 9pfs hangs since 4.7

Greg Kurz <groug@xxxxxxxx> · Tue, 10 Jan 2017 09:16:18 +0100

On Sun, 8 Jan 2017 05:46:39 +0000
Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:

> On Sat, Jan 07, 2017 at 06:15:23PM +0000, Al Viro wrote:
> > On Sat, Jan 07, 2017 at 05:19:10PM +0000, Al Viro wrote:
> >   
> > > released) simply trigger virtio_queue_notify_vq() again?  It *is* a bug
> > > (if we get a burst filling a previously empty queue all at once, there won't
> > > be any slots becoming freed  
> > 
> > Umm...  Nope, even in that scenario we'll have all requests except the last
> > one processed.  I'm trying to put together a simpler reproducer, but...
> > no luck so far.  
> 
> FWIW, here's another fun bug in qemu 9pfs: client may issue multiple
> Tflush on the same pending request.  Server must not reply to them
> out of order and it must reply to at least the last one.  If client has
> sent more than one Tflush, it must treat an arrival of Rflush to one of
> those as if it has received an Rflush to each of the earlier Tflush as
> well (IOW, *their* tags are released).  Client may not reuse the tag of
> victim request until it has received Rflush for the last Tflush it has
> sent.
> 
> Linux kernel-side 9p client doesn't issue such multiple Tflush, but the
> protocol allows that.
> 
> qemu server does not guarantee in-order replies to multiple Tflush; worse,
> while it is likely to reply only to one of those, it may very well be
> the _first_ one.  Subsequent ones will be lost; moreover, it'll leak
> both coroutines and ring slots.
> 

Yeah I'm aware about that, I had started to work on a fix but it's low
priority on my TODO list since linux guests don't do that... and I got
scheduled.

> AFAICS, a clean way to deal with that would be to handle Tflush synchronously,
> right in pdu_submit():
> 	if pdu->type == Tflush
> 		look the victim request up.
> 		if !victim || victim == pdu	// [*]
> 			send reply and free pdu immediately.
> 		if victim->type == Tflush	// [**]
> 			pdu->victim = victim->victim
> 		else
> 			pdu->victim = victim
> 			mark the victim cancelled
> 		add pdu to the tail of pdu->victim->flushes
> and let pdu_complete(pdu) send a reply to each request in pdu->flushes
> as well (freeing them as we go)
> 

I had kept the asynchronous way and it resulted in quite convoluted code.
I'll give a try as per your suggestion.

> Some locking might be needed to avoid the races between pdu_submit() and
> pdu_complete(), but it's not going to be a wide critical area.  I'm not
> familiar with qemu locking primitives, sorry; what's providing an exclusion
> between QLIST_INSERT_HEAD in pdu_alloc() (from handle_9p_output())
> and QLIST_REMOVE in pdu_free() (from pdu_complete())?  Looks like the same
> thing ought to suffice here...
> 

This code runs in coroutines (stack switching, see include/qemu/coroutine.h for
details): it is serialized.

> [*] a cute way to hurt the current qemu server, BTW - coroutine that waits for
> itself to complete...
> 

Indeed :)

> [**] Tflush to Tflush is another fun corner case - it does *not* do anything
> to the earlier Tflush, but reply to it must follow that to original Tflush
> to avoid tag reuse problems.  Note that this is not the "multiple Tflush"
> case mentioned above - those would all have oldtag pointing to the same
> request; they are not chained and unlike this one can happen with legitimate
> clients.  Tflush to Tflush, OTOH, is not something a sane client would do.

I agree I don't see when a client would want to do that but FWIW, it is
mentioned in the last paragraph of http://man.cat-v.org/plan_9/5/flush :)

Thanks for your valuable suggestions, Al!

Cheers.

--
Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html