RE: Issue found with kernel/net/sunrpc/xdr.c

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Trond, I've just now completed testing of your patch -- it definitely fixes the corruption issue.

Also, your patch is cleaner and probably more efficient than ours.

Let me and Mark know if you need anything else from us on this issue -- thanks for your help!

-----Original Message-----
From: Myklebust, Trond [mailto:Trond.Myklebust@xxxxxxxxxx] 
Sent: Wednesday, August 28, 2013 12:52 PM
To: Matt Craighead
Cc: J. Bruce Fields; Mark Young; linux-nfs@xxxxxxxxxxxxxxx
Subject: Re: Issue found with kernel/net/sunrpc/xdr.c

On Wed, 2013-08-28 at 08:18 -0700, Matt Craighead wrote:
> > I'm curious why we haven't seen it before
> 
> I agree, it's slightly mysterious.  We're hitting the bug on a 32-bit ARM system with 1GB or 2GB of memory; surely that's not very far off the beaten path.
> 
> 
> > Or maybe memmove is an architecture-specific implementation that happens to handle left-to-right overlapping copies correctly on common architectures?
> 
> It's architecture specific, e.g.: http://lxr.linux.no/linux+v3.8/arch/arm/lib/memmove.S
> 
> In this particular case, it decides that the memory is non-overlapping (by comparing the virtual addresses), so it incorrectly dispatches to memcpy().  At that point all bets are off in terms of the direction, chunk size, etc. of the copy.
> 
> I guess I'd typically expect memcpy() to stride through memory forwards rather than backwards though.  And since this function mandates "from < to", it seems like this would usually fail.
> 
> 
> For x86: http://lxr.linux.no/linux+v3.8/arch/x86/lib/memcpy_32.c
> 
> In this case, it looks like it doesn't dispatch to memcpy().  It just determines forward/backwards and copies accordingly.  So it would work as long as the "from < to" property was preserved.
> 
> If I'm reading the code correctly, kmap_atomic appears to grow downward on x86:
> http://lxr.linux.no/linux+v3.8/arch/x86/mm/highmem_32.c
> http://lxr.linux.no/linux+v3.8/arch/x86/include/asm/fixmap.h
> etc.
> 
> Since xdr.c calls kmap_atomic on *pgto first, then *pgfrom, the "from < to" property will therefore be preserved.
> 
> 
> Therefore, I suspect you might be able to reproduce the bug on (32-bit) x86 by doing any of the following:
> - swapping the order of those kmap_atomic/kunmap_atomic calls 
> - modifying kmap_atomic to grow in the opposite direction
> - modifying memmove() to dispatch to memcpy() when the virtual regions are non-overlapping
> 
> It's still possible that there is something else funny about how memory gets allocated in our setup that makes the bug more likely, but empirically, the bug didn't seem hard to hit.  We didn't need to copy very much data/very many files over NFS before we typically got corruption, and the bug happened on a variety of specific platforms.  It was (unsurprisingly) easier to reproduce with 2GB than with 1GB though.

Can you please test that the attached patch fixes the corruption?

Thanks
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux