On Thu, 2021-04-22 at 14:02 +0200, Jan Kara wrote: > On Thu 22-04-21 07:43:16, Jeff Layton wrote: > > On Thu, 2021-04-22 at 13:15 +0200, Jan Kara wrote: > > > Hello, > > > > > > I'm looking into how Ceph protects against races between page fault and > > > hole punching (I'm unifying protection for this kind of races among > > > filesystems) and AFAICT it does not. What I have in mind in particular is a > > > race like: > > > > > > CPU1 CPU2 > > > > > > ceph_fallocate() > > > ... > > > ceph_zero_pagecache_range() > > > ceph_filemap_fault() > > > faults in page in the range being > > > punched > > > ceph_zero_objects() > > > > > > And now we have a page in punched range with invalid data. If > > > ceph_page_mkwrite() manages to squeeze in at the right moment, we might > > > even associate invalid metadata with the page I'd assume (but I'm not sure > > > whether this would be harmful). Am I missing something? > > > > > > Honza > > > > No, I don't think you're missing anything. If ceph_page_mkwrite happens > > to get called at an inopportune time then we'd probably end up writing > > that page back into the punched range too. What would be the best way to > > fix this, do you think? > > > > One idea: > > > > We could lock the pages we're planning to punch out first, then > > zero/punch out the objects on the OSDs, and then do the hole punch in > > the pagecache? Would that be sufficient to close the race? > > Yes, that would be sufficient but very awkward e.g. if you want to punch > out 4GB of data which even needn't be in the page cache. But all > filesystems have this problem - e.g. ext4, xfs, etc. have already their > private locks to avoid races like this, I'm now working on lifting the > fs-private solutions into a generic one so I'll fix CEPH along the way as > well. I was just making sure I'm not missing some other protection > mechanism in CEPH. > Even better! I'll keep an eye out for your patches. Thanks, -- Jeff Layton <jlayton@xxxxxxxxxx>