On Fri, Aug 08, 2014 at 10:32:58PM -0400, Chris Mason wrote: > > > On 08/08/2014 08:36 PM, Dave Chinner wrote: > > On Fri, Aug 08, 2014 at 10:35:38AM -0400, Chris Mason wrote: > >> > >> xfs is using truncate_pagecache_range to invalidate the page cache > >> during DIO reads. This is different from the other filesystems who only > >> invalidate pages during DIO writes. > > > > Historical oddity thanks to wrapper functions that were kept way > > longer than they should have been. > > > >> truncate_pagecache_range is meant to be used when we are freeing the > >> underlying data structs from disk, so it will zero any partial ranges > >> in the page. This means a DIO read can zero out part of the page cache > >> page, and it is possible the page will stay in cache. > > > > commit fb59581 ("xfs: remove xfs_flushinval_pages"). also removed > > the offset masks that seem to be the issue here. Classic case of a > > regression caused by removing 10+ year old code that was not clearly > > documented and didn't appear important. > > > > The real question is why isn't fsx and other corner case data > > integrity tools tripping over this? > > > > My question too. Maybe not mixing buffered/direct for partial pages? > Does fsx only do 4K O_DIRECT? No. xfstests::tests/generic/091 is supposed to cover this exact case. It runs fsx with direct IO aligned to sector boundaries amongst other things. $ ./lsqa.pl tests/generic/091 FS QA Test No. 091 fsx exercising direct IO -- sub-block sizes and concurrent buffered IO $ > > >> buffered reads will find an up to date page with zeros instead of the > >> data actually on disk. > >> > >> This patch fixes things by leaving the page cache alone during DIO > >> reads. > >> > >> We discovered this when our buffered IO program for distributing > >> database indexes was finding zero filled blocks. I think writes > >> are broken too, but I'll leave that for a separate patch because I don't > >> fully understand what XFS needs to happen during a DIO write. > >> > >> Test program: > > > > Encapsulate it in a generic xfstest, please, and send it to > > fstests@xxxxxxxxxxxxxxx. > > This test prog was looking for races, which we really don't have. It > can be much shorter to just look for the improper zeroing from a single > thread. I can send it either way. Doesn't matter, as long as we have something that exercises this case.... > > Besides, XFS's direct IO semantics are far saner, more predictable > > and hence are more widely useful than the generic code. As such, > > we're not going to regress semantics that have been unchanged > > over 20 years just to match whatever insanity the generic Linux code > > does right now. > > > > Go on, call me a deranged monkey on some serious mind-controlling > > substances. I don't care. :) > > The deranged part is invalidating pos -> -1 on a huge file because of a > single 512b direct read. But, if you mix O_DIRECT and buffered you get > what the monkeys give you and like it. That's a historical artifact - it predates the range interfaces that Linux has grown over the years, and every time we've changed it to match teh I/O range subtle problems have arisen. THose are usually due to other bugs we knew nothing about at the time, but that's the way it goes... > > I think the fix should probably just be: > > > > - truncate_pagecache_range(VFS_I(ip), pos, -1); > > + invalidate_inode_pages2_range(VFS_I(ip)->i_mapping, > > + pos >> PAGE_CACHE_SHIFT, -1); > > > > I'll retest with this in the morning. The invalidate is basically what > we had before with the masking & PAGE_CACHE_SHIFT. Yup. Thanks for finding these issuesi, Chris! Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs