Re: 9pfs hangs since 4.7

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 04, 2017 at 01:34:50AM +0200, Tuomas Tynkkynen wrote:

> I got these logs from the server & client with 9p tracepoints enabled:
> 
> https://gist.githubusercontent.com/dezgeg/02447100b3182167403099fe7de4d941/raw/3772e408ddf586fb662ac9148fc10943529a6b99/dmesg%2520with%25209p%2520trace
> https://gist.githubusercontent.com/dezgeg/e1e0c7f354042e1d9bdf7e9135934a65/raw/3a0e3b4f7a5229fd0be032c6839b578d47a21ce4/qemu.log
> 

Lovely...

27349:out 8 0001 		TSTATFS, tag 1
27350:out 12 0001		TLOPEN, tag 1
27351:complete 9 0001		RSTATFS, tag 1
27352:complete 13 0001		RLOPEN, tag 1

27677:out 8 0001		TSTATFS, tag 1
27678:out 12 0001		TLOPEN, tag 1
27679:complete 9 0001		RSTATFS, tag 1
27680:complete 13 0001		RLOPEN, tag 1

29667:out 8 0001		TSTATFS, tag 1
29668:out 110 0001		TWALK, tag 1
29669:complete 9 0001		RSTATFS, tag 1
29670:complete 7 0001		RLERROR, tag 1

42317:out 110 0001		TWALK, tag 1
42318:out 8 0001		TSTATFS, tag 1
42319:complete 9 0001		RSTATFS, tag 1
42320:complete 7 0001		RERROR, tag 1

Those are outright protocol violations: tag can be reused only after either
a reply bearing the same tag has arrived *or* TFLUSH for that tag had been
issued and successful (type == RFLUSH) reply bearing the tag of that TFLUSH
has arrived.  Your log doesn't contain any TFLUSH (108) at all, so it should've
been plain and simple "no reuse until server sends a reply with matching tag".

Otherwise the the dump looks sane.  In the end all issued requests had been
served, so it's not as if the client had been waiting for a free tag or for
a response to be produced by the server.

Interesting...  dmesg doesn't seem to contain the beginning of the 9P trace -
only 89858 out of 108818 in the dump, even though it claims to have lost
only 4445 events.  The last exchanges in the trace are
> P9_TLOPEN tag  90 err 0
> P9_TLOPEN tag  25 err 0
> P9_TLOPEN tag  15 err 0
> P9_TLOPEN tag  104 err 0
> P9_TLOPEN tag  102 err 0
> P9_TLOPEN tag  91 err 0
> P9_TLOPEN tag  12 err 0
< P9_TREADLINK tag  12

which should've been
complete 13 005a
complete 13 0019
complete 13 000f
complete 13 0068
complete 13 0066
complete 13 005b
out 22 000c

and the only plausible match for that is

108784:complete 13 005a
108798:complete 13 000f
108799:complete 13 0019
108800:complete 13 0068
108801:complete 13 0066
108802:complete 13 005b

but there's a bunch of replies in between that hadn't been logged *and* such
a TREADLINK request isn't seen after
108341:out 22 000c

... and sure enough, there's a whole lot of processes stuck in readlink.

Interesting...  Which version of qemu/what arguments are you using there?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux