On Mon, Oct 26, 2020 at 03:57:10PM +0100, Jan Kara wrote: > Hello! > > When reviewing Matthew's THP patches I've noticed one odd behavior which > got copied from current iomap seek hole/data helpers. Currently we have: > > # fallocate -l 4096 testfile > # xfs_io -x -c "seek -h 0" testfile > Whence Result > HOLE 0 > # dd if=testfile bs=4096 count=1 of=/dev/null > # xfs_io -x -c "seek -h 0" testfile > Whence Result > HOLE 4096 > > So once we read from an unwritten extent, the areas with cached pages > suddently become treated as data. Later when pages get evicted, they become > treated as holes again. Strictly speaking I wouldn't say this is a bug > since nobody promises we won't treat holes as data but it looks weird. > Shouldn't we treat clean pages over unwritten extents still as holes and > only once the page becomes dirty treat is as data? What do other people > think? I think we actually discussed this recently. Unless I misunderstood one or both messages: https://lore.kernel.org/linux-fsdevel/20201014223743.GD7391@xxxxxxxxxxxxxxxxxxx/ I agree it's not great, but I'm not sure it's worth getting it "right" by tracking whether a page contains only zeroes. I have been vaguely thinking about optimising for read-mostly workloads on sparse files by storing a magic entry that means "use the zero page" in the page cache instead of a page, like DAX does (only better). It hasn't risen to the top of my list yet. Does anyone have a workload that would benefit from it? (I don't mean "can anybody construct one"; that's trivially possible. I mean, do any customers care about the performance of that workload?)