Re: Hole punch races in Ceph

Jeff Layton <jlayton@xxxxxxxxxx> · Thu, 22 Apr 2021 08:05:12 -0400

On Thu, 2021-04-22 at 14:02 +0200, Jan Kara wrote:
> On Thu 22-04-21 07:43:16, Jeff Layton wrote:
> > On Thu, 2021-04-22 at 13:15 +0200, Jan Kara wrote:
> > > Hello,
> > > 
> > > I'm looking into how Ceph protects against races between page fault and
> > > hole punching (I'm unifying protection for this kind of races among
> > > filesystems) and AFAICT it does not. What I have in mind in particular is a
> > > race like:
> > > 
> > > CPU1					CPU2
> > > 
> > > ceph_fallocate()
> > >   ...
> > >   ceph_zero_pagecache_range()
> > > 					ceph_filemap_fault()
> > > 					  faults in page in the range being
> > > 					  punched
> > >   ceph_zero_objects()
> > > 
> > > And now we have a page in punched range with invalid data. If
> > > ceph_page_mkwrite() manages to squeeze in at the right moment, we might
> > > even associate invalid metadata with the page I'd assume (but I'm not sure
> > > whether this would be harmful). Am I missing something?
> > > 
> > > 								Honza
> > 
> > No, I don't think you're missing anything. If ceph_page_mkwrite happens
> > to get called at an inopportune time then we'd probably end up writing
> > that page back into the punched range too. What would be the best way to
> > fix this, do you think?
> > 
> > One idea:
> > 
> > We could lock the pages we're planning to punch out first, then
> > zero/punch out the objects on the OSDs, and then do the hole punch in
> > the pagecache? Would that be sufficient to close the race?
> 
> Yes, that would be sufficient but very awkward e.g. if you want to punch
> out 4GB of data which even needn't be in the page cache. But all
> filesystems have this problem - e.g. ext4, xfs, etc. have already their
> private locks to avoid races like this, I'm now working on lifting the
> fs-private solutions into a generic one so I'll fix CEPH along the way as
> well. I was just making sure I'm not missing some other protection
> mechanism in CEPH.
> 

Even better! I'll keep an eye out for your patches.

Thanks,
-- 
Jeff Layton <jlayton@xxxxxxxxxx>