On Sun, Sep 20, 2020 at 04:31:57PM -0700, Linus Torvalds wrote: > On Sun, Sep 20, 2020 at 4:23 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > FWIW, if the fs layer is already providing this level of IO > > exclusion w.r.t. address space access, does it need to be replicated > > at the address space level? > > Honestly, I'd rather do it the other way, and go "if the vfs layer > were to provide the IO exclusion, maybe the filesystems can drop it? I'm not sure it can because of the diversity of filesystems and their locking requirements. XFS spent many, many years ignoring the VFS inode locking because it was a mutex and instead did all it's own locking with rwsems internally. > Because we end up having something like 60 different filesystems. It's > *really* hard to know that "Yeah, this filesystem does it right". Agreed, but I don't think moving the serialisation up to the VFS will fix that because many filesysetms will still have to do their own thing. e.g. cluster filesystems requiring cluster locks first, not local inode rwsems.... > And if we do end up doing it at both levels, and end up having some of > the locking duplicated, that's still better than "sometimes we don't > do it at all", and have odd problems on the less usual (and often less > well maintained) filesystems.. Sure. However, the problem I'm trying to avoid arises when the filesystem is able to do things concurrently that the new locking in the VFS/address space can't do concurrently. e.g. the range locking I'm working on for XFS allows truncate to run concurrently with reads, writes, fallocate(), etc and it all just works right now. Adding address space wide exclusive locking to the page cache will likely defeat this - we can no longer run fallocate() concurrently and so cannot solve the performance problems we have with qemu+qcow2 where it uses fallocate() to initialise sparse clusters and so fallocate() on a single uninitialised cluster serialises all IO to the image file. Hence I'd like to avoid introducing address-space wide serialisation for invalidation at the page cache level just as we are about to enable concurrency for operations that do page cache invalidation. Range locking in the page cache would be fine, but global locking will be .... problematic. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx