Re: return layout on error, BUG/deadlock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 10, 2012 at 3:31 AM, Myklebust, Trond
<Trond.Myklebust@xxxxxxxxxx> wrote:
> On Fri, 2012-08-10 at 00:48 +0800, Peng Tao wrote:
>> On Fri, Aug 10, 2012 at 12:37 AM, Myklebust, Trond
>> <Trond.Myklebust@xxxxxxxxxx> wrote:
>> > On Fri, 2012-08-10 at 00:34 +0800, Peng Tao wrote:
>> >> On Fri, Aug 10, 2012 at 12:06 AM, Myklebust, Trond
>> >> <Trond.Myklebust@xxxxxxxxxx> wrote:
>> >> > On Thu, 2012-08-09 at 18:49 +0300, Idan Kedar wrote:
>> >> >> On Thu, Aug 9, 2012 at 5:05 PM, Myklebust, Trond
>> >> >> <Trond.Myklebust@xxxxxxxxxx> wrote:
>> >> >> >> -----Original Message-----
>> >> >> >> From: linux-nfs-owner@xxxxxxxxxxxxxxx [mailto:linux-nfs-
>> >> >> >> owner@xxxxxxxxxxxxxxx] On Behalf Of Idan Kedar
>> >> >> >> Sent: Thursday, August 09, 2012 9:03 AM
>> >> >> >> To: Boaz Harrosh; NFS list
>> >> >> >> Cc: Benny Halevy
>> >> >> >> Subject: return layout on error, BUG/deadlock
>> >> >> >>
>> >> >> >> Hi,
>> >> >> >>
>> >> >> >> As a result of some experiments, I wanted to see what happens when I
>> >> >> >> inject an error (hard coded) to the object layout driver. the patch is at the
>> >> >> >> bottom of this mail. the reason I did this is because when I inject errors in my
>> >> >> >> modified version of the object layout driver, I get the same BUG Tigran
>> >> >> >> reported about yesterday:
>> >> >> >> nfs4proc.c:6252 :   BUG_ON(!list_empty(&lo->plh_segs));
>> >> >> >>
>> >> >> >> In my modified version (based on kernel 3.3), the bug seems to be that
>> >> >> >> pnfs_ld_write_done calls pnfs_return_layout in the error path, even if there
>> >> >> >> is in-flight I/O.
>> >> >> >
>> >> >> > That is not a bug. It is an intentional change in order to allow the MDS to fence off the outstanding writes (if it can do so) before we retransmit them as write-through-MDS. Otherwise, you risk races between the outstanding writes-to-DS and the new writes-through-MDS.
>> >> >>
>> >> >> to what change are you referring?
>> >> >
>> >> > As I stated in the changelog of the patch that I sent to the list
>> >> > yesterday, the behaviour is due to commit 0a57cdac3f.
>> >> >
>> >> >> >
>> >> >> > See the changelog in the patch that I sent to the list yesterday.
>> >> >> >
>> >> >>
>> >> >> I saw that, and if I'm not mistaken these races apply to object layout
>> >> >> as well, and in any case they apply in my case. However, it is not
>> >> >> easy to mess around with LAYOUTRETURN in object layout, and there have
>> >> >> been several discussions on the issue. In one of these discussions
>> >> >> Benny clarified that the object layout client must wait for all
>> >> >> in-flight I/O to end.
>> >> >
>> >> > If the problem is that the DS is failing to respond, how does the client
>> >> > know that the in-flight I/O has ended?
>> >> >
>> >> >> So for file layout it probably makes sense, but object layout (and if
>> >> >> I understand correctly, block layout as well) something else needs to
>> >> >> be done. I thought about sync wait when returning the layout on error,
>> >> >> but according to Boaz it will cause deadlocks (Boaz - can you please
>> >> >> elaborate?).
>> >> >
>> >> > The object layoutreturn has the ability to pass a timeout error value to
>> >> > the MDS precisely in order to allow the latter to deal with this kind of
>> >> > issue. See the description of struct pnfs_osd_ioerr4 in rfc5664.
>> >> >
>> >> > The block layout is adding the same ability to layoutreturn in NFSv4.2
>> >> > (see draft-ietf-nfsv4-minorversion2-13.txt) via the struct
>> >> > layoutreturn_device_error4, so presumably they too have a plan for
>> >> > dealing with this kind of issue.
>> >> It is one thing to tell MDS that there is DS access error by sending
>> >> layoutreturn, and it is another thing to return a layout even if there
>> >> is overlapping in-flight DS IO...
>> >>
>> >> I certainly agree that client is entitled to return layout to inform
>> >> MDS about DS errors and also avoid possible cb_layoutrecall. But it is
>> >> just an optimization and should only be done when there is no
>> >> in-flight IO (at least for block layout) IMHO.
>> >
>> > HOW DO YOU GUARANTEE NO IN-FLIGHT IO?
>> >
>> I don't. That's why I don't return layout in pnfs_ld_write_done(). And
>> for layoutreturn upon cb_layoutreturn, block layout client needs to do
>> timed-lease IO fencing per rfc5663, but it is not implemented in Linux
>> client.
>
> The timed-lease IO fencing described in rfc5663 is about informing the
> server about how long the client expects a command to succeed or fail.
> It doesn't offer any advice for how the client is to deal with an
> unresponsive DS.
>
> What you need here is help from the underlying transport protocol. As I
> said in the email to Idan, when researching iSCSI and iFCP, I found what
> appears to be mechanisms for reliably timing out.
Just checked and found that the layoutreturn-on-error behavior only
affects object and file layout. So block layout stays out and safe.
That's all I would ask for. Thanks for your explanation.

Best,
Tao
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux