On Fri, May 18, 2018 at 08:53:30AM -0700, Christoph Hellwig wrote: > On Fri, May 18, 2018 at 06:13:06AM -0700, Matthew Wilcox wrote: > > > Historically, the only problematic case has been direct IO, and people > > > have been willing to say "well, if you mix buffered and direct IO you > > > get what you deserve", and that's probably not unreasonable. But now we > > > have fallocate insert range and collapse range, and those are broken in > > > ways I frankly don't want to think about if they can't ensure consistency > > > with the page cache. > > > > ext4 manages collapse-vs-pagefault with the ext4-specific i_mmap_sem. > > You may get pushback on the grounds that this ought to be a > > filesystem-specific lock rather than one embedded in the generic inode. > > Honestly I think this probably should be in the core. But IFF we move > it to the core the existing users of per-fs locks need to be moved > over first. E.g. XFS as the very first one, and at least ext4 and f2fs > that copied the approach, and probably more if you audit deep enough. I'm not going to go and redo locking in XFS and ext4 as a prerequisite to merging bcachefs. Sorry, but that's a bit crazy. I am more than happy to work on the locking itself if we can agree on what semantics we want out of it. We have two possible approaches, and we're going to have to pick one first: the locking can be done at the top of the IO stack (like ext4 and I'm guessing xfs), but then we're adding locking overhead to buffered reads and writes that don't need it because they're only touching pages that are already in cache. Or we can go with my approach, pushing down the locking to only when we need to add pages to the page cache. I think if we started out by merging my approach, it would be pretty easy to have it make use of Mathew's fancy xarray based range locking when that goes in, the semantics should be similar enough. If people are ok with and willing to use my approach, I can polish it up - add lockdep support and whatever else I can think of, and attempt to get rid of the stupid recursive part. But that's got to be decided first, where in the call stack the locking should be done.