Re: [PATCH] pnfs: do not reset to mds if wb_offset != wb_pgbase

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2013-03-18 18:39, Myklebust, Trond wrote:
> On Mon, 2013-03-18 at 18:22 +0200, Benny Halevy wrote:
>> On 2013-03-18 17:55, Myklebust, Trond wrote:
>>> On Mon, 2013-03-18 at 16:38 +0200, Benny Halevy wrote:
>>>> We're seeing roughly 20% of the I/Os going to the MDS
>>>> when installing a VM over KVM in "none" caching mode (O_DIRECT).
>>>> Instrumenting the client reveled that this is caused by buffer
>>>> alignment vs. file offset alignment.
>>>> Besides being a performance problem, when the MDS caches data
>>>> this is also manifested as data corruption when data is written
>>>> first via the MDS, then via the DS, eventually the stale data is
>>>> read back from the MDS.
>>>
>>> That's why we should return the layout.
>>
>> We are not in this case.
> 
> Doh! I was thinking it was a case where we need to fence...
> 
> Actually, it shouldn't be needed: we will always do a _stable_ write of
> the data before we try to read it back in from the server, so MDS
> caching shouldn't be a problem.
> 

Writing stable to the MDS does not solve all cases.
The corruption we've seen happens like this:

write(A) to MDS
write(B) to DS
read(A) from MDS - since the MDS is caching the last data written to it.

>>>> Note that this check exists also for the file layout specific
>>>> pg_init_* functions.  The objects (ORE) and block
>>>> (bl_{read,write}_pagelist) layouts seem to deal correctly with
>>>> splitting IOs in the case where req->wb_offset != req->wb_pgbase
>>>> though this hasn't been tested wen submitting this patch.
>>>>
>>> NACK. I see no evidence that we've addressed the issues that were raised
>>> by Fred in commit 1825a0d08f22463e5a8f4b1636473efd057a3479 (NFS: prepare
>>> coalesce testing for directio).
>>> If you think that his concerns about the coalescing assumptions are no
>>> longer true, then please point to why this is the case. AFAICR that
>>> patch was added to fix corruption issues.
>>>
>>
>> We see no problems with this patch with the workloads we're testing.
>> Do you have a test that reproduces the original problem that we can try running?
> 
> I suspect it was one of the nfstests. (see
> git://git.linux-nfs.org/projects/mora/nfstest.git ) since Fred was
> working with Jorge to do the O_DIRECT testing.
> 
> Fred, Jorge?
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux