Re: Fatal crash with NFS, AIO & tcp retransmit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2013-01-21 at 15:01 +0000, Alex Bligh wrote:
> Trond,
> 
> --On 21 January 2013 14:38:20 +0000 "Myklebust, Trond" 
> <Trond.Myklebust@xxxxxxxxxx> wrote:
> 
> > The Oops would be due to a bug in the socket layer: the socket is
> > supposed to take a reference count on the page in order to ensure that
> > it can copy the contents.
> 
> Looking at the original linux-nfs link, you said here:
> http://marc.info/?l=linux-nfs&m=122424789508577&w=2
> 
> Trond:> I don't see how this could be an RPC bug. The networking
> Trond:> layer is supposed to either copy the data sent to the socket,
> Trond:> or take a reference to any pages that are pushed via
> Trond:> the ->sendpage() abi.
> 
> which sounds suspiciously like the same thing.
> 
> The conversation then went:
> http://marc.info/?l=linux-nfs&m=122424858109731&w=2
> Ian:> The pages are still referenced by the networking layer. The problem is
> Ian:> that the userspace app has been told that the write has completed so
> Ian:> it is free to write new data to those pages.
> 
> To which you replied:
> http://marc.info/?l=linux-nfs&m=122424984612130&w=2
> Trond:> OK, I see your point.

The original thread did not AFAICR involve an Oops. If you are seeing an
Oops, then that is something new and would be a socket level bug.

> Following the thread, it then seems that Ian's test case did fail on
> NFS4 on 2.6.18, but not on 2.6.27.
> 
> Note that Ian was seeing something slightly different from me. I think
> what he was seeing was alterations to the page after AIO completes
> being retransmitted when the page prior to the alteration should
> be transmitted. That could presumably be fixed by some COW device.
> 
> What I'm seeing is more subtle. Xen thinks (because QEMU tells it,
> because AIO tells it) that the memory is done with entirely, and
> simply unmaps it. I don't think that's Qemu's fault.
> 
> If it is a referencing issue, then it seems to me the problem is
> that Xen is releasing the grant structure (I don't quite understand
> how this bit works) and unmapping memory when the networking stack
> still holds a reference to the page concerned. However, even if it
> did not do that, wouldn't a retransmit after the write had completed
> risk writing the wrong data? I suppose it could mark the page
> COW before it released the grant or something.
> 
> > As for the O_DIRECT bug, the problem there is that we have no way of
> > knowing when the socket is done writing the page. Just because we got an
> > answer from the server doesn't mean that the socket is done
> > retransmitting the data. It is quite possible that the server is just
> > replying to the first transmission.
> 
> I don't think QEMU is actually using O_DIRECT unless I set cache=none
> on the drive. That causes a different interesting failure which isn't
> my focus just now!

Then your reference to Ian's bug is a red herring.

If the application is using buffered writes, then the data is
immediately copied from userspace to the page cache. Once the copy to
the page cache is done, userspace can do whatever it wants with the
original buffer, because only the page cache pages are used in the RPC
calls.

aio doesn't change any of this...

> > I thought that Ian was working on a fix for this issue. At one point, he
> > had a bunch of patches to allow sendpage() to call you back when the
> > transmission was done. What happened to those patches?
> 
> No idea (I don't work with Ian but have taken the liberty of copy him).
> 
> However, what's happened in the intervening years is that Xen has changed
> its device model and it's now QEMU doing the writing (the qcow2 driver
> specifically). I'm not sure it's even using sendpage.

The kernel RPC layer uses sendpage to transmit pages as part of an RPC
call.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com
��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux