Re: return layout on error, BUG/deadlock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2012-08-10 at 00:48 +0800, Peng Tao wrote:
> On Fri, Aug 10, 2012 at 12:37 AM, Myklebust, Trond
> <Trond.Myklebust@xxxxxxxxxx> wrote:
> > On Fri, 2012-08-10 at 00:34 +0800, Peng Tao wrote:
> >> On Fri, Aug 10, 2012 at 12:06 AM, Myklebust, Trond
> >> <Trond.Myklebust@xxxxxxxxxx> wrote:
> >> > On Thu, 2012-08-09 at 18:49 +0300, Idan Kedar wrote:
> >> >> On Thu, Aug 9, 2012 at 5:05 PM, Myklebust, Trond
> >> >> <Trond.Myklebust@xxxxxxxxxx> wrote:
> >> >> >> -----Original Message-----
> >> >> >> From: linux-nfs-owner@xxxxxxxxxxxxxxx [mailto:linux-nfs-
> >> >> >> owner@xxxxxxxxxxxxxxx] On Behalf Of Idan Kedar
> >> >> >> Sent: Thursday, August 09, 2012 9:03 AM
> >> >> >> To: Boaz Harrosh; NFS list
> >> >> >> Cc: Benny Halevy
> >> >> >> Subject: return layout on error, BUG/deadlock
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> As a result of some experiments, I wanted to see what happens when I
> >> >> >> inject an error (hard coded) to the object layout driver. the patch is at the
> >> >> >> bottom of this mail. the reason I did this is because when I inject errors in my
> >> >> >> modified version of the object layout driver, I get the same BUG Tigran
> >> >> >> reported about yesterday:
> >> >> >> nfs4proc.c:6252 :   BUG_ON(!list_empty(&lo->plh_segs));
> >> >> >>
> >> >> >> In my modified version (based on kernel 3.3), the bug seems to be that
> >> >> >> pnfs_ld_write_done calls pnfs_return_layout in the error path, even if there
> >> >> >> is in-flight I/O.
> >> >> >
> >> >> > That is not a bug. It is an intentional change in order to allow the MDS to fence off the outstanding writes (if it can do so) before we retransmit them as write-through-MDS. Otherwise, you risk races between the outstanding writes-to-DS and the new writes-through-MDS.
> >> >>
> >> >> to what change are you referring?
> >> >
> >> > As I stated in the changelog of the patch that I sent to the list
> >> > yesterday, the behaviour is due to commit 0a57cdac3f.
> >> >
> >> >> >
> >> >> > See the changelog in the patch that I sent to the list yesterday.
> >> >> >
> >> >>
> >> >> I saw that, and if I'm not mistaken these races apply to object layout
> >> >> as well, and in any case they apply in my case. However, it is not
> >> >> easy to mess around with LAYOUTRETURN in object layout, and there have
> >> >> been several discussions on the issue. In one of these discussions
> >> >> Benny clarified that the object layout client must wait for all
> >> >> in-flight I/O to end.
> >> >
> >> > If the problem is that the DS is failing to respond, how does the client
> >> > know that the in-flight I/O has ended?
> >> >
> >> >> So for file layout it probably makes sense, but object layout (and if
> >> >> I understand correctly, block layout as well) something else needs to
> >> >> be done. I thought about sync wait when returning the layout on error,
> >> >> but according to Boaz it will cause deadlocks (Boaz - can you please
> >> >> elaborate?).
> >> >
> >> > The object layoutreturn has the ability to pass a timeout error value to
> >> > the MDS precisely in order to allow the latter to deal with this kind of
> >> > issue. See the description of struct pnfs_osd_ioerr4 in rfc5664.
> >> >
> >> > The block layout is adding the same ability to layoutreturn in NFSv4.2
> >> > (see draft-ietf-nfsv4-minorversion2-13.txt) via the struct
> >> > layoutreturn_device_error4, so presumably they too have a plan for
> >> > dealing with this kind of issue.
> >> It is one thing to tell MDS that there is DS access error by sending
> >> layoutreturn, and it is another thing to return a layout even if there
> >> is overlapping in-flight DS IO...
> >>
> >> I certainly agree that client is entitled to return layout to inform
> >> MDS about DS errors and also avoid possible cb_layoutrecall. But it is
> >> just an optimization and should only be done when there is no
> >> in-flight IO (at least for block layout) IMHO.
> >
> > HOW DO YOU GUARANTEE NO IN-FLIGHT IO?
> >
> I don't. That's why I don't return layout in pnfs_ld_write_done(). And
> for layoutreturn upon cb_layoutreturn, block layout client needs to do
> timed-lease IO fencing per rfc5663, but it is not implemented in Linux
> client.

The timed-lease IO fencing described in rfc5663 is about informing the
server about how long the client expects a command to succeed or fail.
It doesn't offer any advice for how the client is to deal with an
unresponsive DS.

What you need here is help from the underlying transport protocol. As I
said in the email to Idan, when researching iSCSI and iFCP, I found what
appears to be mechanisms for reliably timing out.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com

��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux