Pekka Enberg wrote: > In addition, the vma walk will become an unmaintainable mess as soon > as someone introduces another mmap() capable fs that needs similar > locking. Yup, I suspect that if the core kernel ends up caring about this problem then the VFS will be involved in helping file systems sort the locks they'll acquire around IO. > I am not an expert so could someone please explain why this cannot be > done with a_ops->prepare_write and friends? I'll try, briefly. Usually clustered file systems in Linux maintain data consistency for normal posix IO by holding DLM locks for the duration of their file->{read,write} methods. A task on a node won't be able to read until all tasks on other nodes have finished any conflicting writes they might have been performing, etc, nothing surprising here. Now say we want to extend consistency guarantees to mmap(). This boils down to protecting mappings with DLM locks. Say a page is mapped for reading, the continued presence of that mapping is protected by holding a DLM lock. If another node goes to write to that page, the read lock is revoked and the mapping is torn down. These locks are acquired in a_ops->nopage as the task faults and tries to bring up the mapping. And that's the problem. Because they're acquired in ->nopage they can be acquired during a fault that is servicing the 'buf' argument to an outer file->{read,write} operation which has grabbed a lock for the target file. Acquiring multiple locks introduces the risk of ABBA deadlocks. It's trivial to construct examples of mmap(), read(), and write() on 2 nodes with 2 files that deadlock. So clustered file systems in Linux (GFS, Lustre, OCFS2, (GPFS?)) all walk vmas in their file->{read,write} to discover mappings that belong to their files so that they can preemptively sort and acquire the locks that will be needed to cover the mappings that might be established in ->nopage. As you point out, this both relies on the mappings not changing and gets very exciting when you mix files and mappings between file systems that are each sorting and acquiring their own DLM locks. I brought this up with some people at the kernel summit but no one, including myself, considers it a high priority. It wouldn't be too hard to construct a patch if people want to take a look. - z -- Linux-cluster@xxxxxxxxxx http://www.redhat.com/mailman/listinfo/linux-cluster