Re: [PATCH] pnfs: do not reset to mds if wb_offset != wb_pgbase

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2013-03-18 at 18:45 +0200, Benny Halevy wrote:
> On 2013-03-18 18:39, Myklebust, Trond wrote:
> > On Mon, 2013-03-18 at 18:22 +0200, Benny Halevy wrote:
> >> On 2013-03-18 17:55, Myklebust, Trond wrote:
> >>> On Mon, 2013-03-18 at 16:38 +0200, Benny Halevy wrote:
> >>>> We're seeing roughly 20% of the I/Os going to the MDS
> >>>> when installing a VM over KVM in "none" caching mode (O_DIRECT).
> >>>> Instrumenting the client reveled that this is caused by buffer
> >>>> alignment vs. file offset alignment.
> >>>> Besides being a performance problem, when the MDS caches data
> >>>> this is also manifested as data corruption when data is written
> >>>> first via the MDS, then via the DS, eventually the stale data is
> >>>> read back from the MDS.
> >>>
> >>> That's why we should return the layout.
> >>
> >> We are not in this case.
> > 
> > Doh! I was thinking it was a case where we need to fence...
> > 
> > Actually, it shouldn't be needed: we will always do a _stable_ write of
> > the data before we try to read it back in from the server, so MDS
> > caching shouldn't be a problem.
> > 
> 
> Writing stable to the MDS does not solve all cases.
> The corruption we've seen happens like this:
> 
> write(A) to MDS
> write(B) to DS
> read(A) from MDS - since the MDS is caching the last data written to it.

That looks like a server bug to me. If I write the data to stable
storage in both the A and B case above, then I expect READs the MDS and
the DS to return the same data.
That's particularly true in the case of O_DIRECT reads and writes; the
server can't make assumptions as to whether or not the next client to
read the data will use the DS or the MDS.

Note that I'm happy to accept that our client may not be meeting the
requirements of "write to stable storage" here if, say, we're failing to
issue a LAYOUTCOMMIT after the WRITE(B). If that's the case, then we
need to fix that.
My beef is rather with the notion that _if_ the client meets the stable
storage criterion, then the MDS can somehow still lie to us.


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux