On 2016/7/20 13:59, Al Viro wrote:
[387421.143529] CPU: 69 PID: 39485 Comm: rsync Tainted: G W
------------ 3.10.0-327.18.2.el7.ppc64le #1
Wait a sec, what is that doing on l-k? It's an RH kernel, and nowhere
near the current one, at that... Anyway,
a) can you reproduce it with the mainline?
b) can you reproduce it on more current RH kernel?
I am building the 4.6 kernel from mainline to try to reproduce it.
And also will try more current RH kernel-3.10.0-327.22.2.el7.
Will update later.
c) how hard it is to reproduce? You've mentioned "crashes", which
sounds like there had been more than one...
I have no test case to reproduce it. But it crashes regularly. And it
happens on 3 physical
machines (powerpc).
Another question: which filesystem type had that been? If you have that
crashdump, dentry->d_sb->s_type->name should give that information...
It is xfs. http://paste.ubuntu.com/20278867/
I don't see any likely candidates for that bug - not in mainline, not in
the kernel you'd been running there. Basically, once we have obtained
a pointer to dentry (which should happen only in __d_alloc()[1]), it should
never have NULL in its ->d_name.name. Any path through __d_alloc() that
doesn't end up returning NULL will pass through
dname[name->len] = 0;
/* Make sure we always see the terminating NUL character */
smp_wmb();
dentry->d_name.name = dname;
which obviously can't end up with dentry->d_name.name == NULL - dname would
have to be NULL as well, and that would oops in the first of the quoted lines.
Nothing should be doing wholesale assignments to ->d_name. Nothing that
had &dentry->d_name passed as an argument should modify its ->name field
(most of such parameters are declared const struct qstr *, and at least in
mainline all of them could be constified that way; I hadn't scanned the
kernel you'd been using yet - it's several hundred calls to RTFS through ;-/)
Nothing outside of fs/dcache.c should modify ->d_name directly either (and
that I'd verified both for mainline and for 3.10.0-327.el7).
And in fs/dcache.c we have very few places modifying that sucker. Namely,
swap_names() and copy_name() in mainline and switch_name() in 3.10.0-327.el7.
Assigned values are either ->d_name.name of another dentry or ->d_iname of
this dentry. The latter is never NULL (it's an array embedded into struct
dentry) and the former would have to have become NULL at some earlier point.
Buggered barriers might be a possibility, but those would probably not leak
all the way into crashdump; I rather doubt that they are buggered, anyway,
since we have store non-NULL into ->d_name.name/lwsync/store a mangled address
of dentry into hash chain vs. fetch a value from hash chain, demangle it into
dentry address, load dentry->d_name.name and observe NULL there. With
another lwsync thrown in between the last two steps, actually, but even
without that the lwsync in writer + address dependency in reader should've
been enough.
[1] struct dentry is never an object with static or automatic storage duration
or a member of any kind of compound object and the only place where a pointer
to any other kind of object is converted into pointer to struct dentry is
dentry = kmem_cache_alloc(dentry_cache, GFP_KERNEL); in __d_alloc(). IOW,
all instances of struct dentry must come from that function, period.
Thanks
Fiona
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html