Re: Fatal crash with NFS, AIO & tcp retransmit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2013-01-23 at 19:37 +0000, Alex Bligh wrote:
> 
> --On 23 January 2013 18:13:34 +0000 "Myklebust, Trond" 
> <Trond.Myklebust@xxxxxxxxxx> wrote:
> 
> >> They can't disappear until they have been successfully transmitted and a
> >> response received.  The problem here is that there were two requests
> >> sent or being sent and the page(s) can't be released until everyone,
> >> including TCP and such, are done with them.
> >>
> >> 		ps
> >
> > Right. The O_DIRECT write() system call will not return until it gets a
> > reply. Similarly, we don't mark an aio/dio request as complete until it
> > too gets a reply. So the data for those requests that need
> > retransmission is still available to be resent through the socket.
> 
> I apologise for my stupidity here as I think I must be missing something.
> 
> I thought we'd established that Xen's grant system doesn't release the page
> until QEMU says the block I/O is complete. QEMU only states that the block
> I/O is complete when AIO says it is.

That's correct. Xen and qemu maintains the mapping until the kernel says
the I/O is complete. To do otherwise would be a bug.

>  What's happening (as far as I can tell
> from the oops) is that the grant system is releasing the page AFTER the aio
> request is complete (and dio may the same), but at that stage the page is
> still referenced by the tcp stack. That contradicts what you say about not
> marking the aio/dio request as complete until it gets a reply, unless it's
> the case that you can get a reply to a request when there is still data
> that the TCP stack can ask to retransmit (I suppose that's conceivable
> if the reply gets sent before the ACK of the data received).

This is exactly what can happen:

     1. send request (A)
     2. timeout waiting for ACK to (A)
     3. queue TCP retransmit of (A) as (B)
     4. receive ACK to original (A), sent at #1, and rpc reply to that
        request.
     5. return success to userspace
     6. userspace reuses (or unmaps under Xen) the buffer
     7. (B), queued at #3, reaches the head of the queue
     8. Try to transmit (B), bug has now happened.

You can also s/TCP/RPC/ and construct a similar issue at the next layer
of the stack, which only happens on NFSv3 AIUI.

> My understanding (which may well be completely wrong) is that the problem
> was that xen was unmapping the page even though it still had kernel
> references to it. This is why the problem does not happen in kvm (which
> does not as I understand it do a similar map/unmap operation). From Ian C I
> understand that just looking at the number of kernel references is not
> sufficient.

Under any userspace process (which includes KVM) you get retransmission
of data which may have changed, because userspace believes the kernel
when it has said it is done with it, and has reused the buffer. All that
is different under Xen is that "changed" can mean "unmapped" which makes
the symptom much worse.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux