On Thu, Nov 07, 2024 at 07:01:37PM +0000, Lorenzo Stoakes wrote: > +.. table:: Config-specific fields > + > + ================================= ===================== ======================================== =============== > + Field Configuration option Description Write lock > + ================================= ===================== ======================================== =============== > + :c:member:`!anon_name` CONFIG_ANON_VMA_NAME A field for storing a mmap write, > + :c:struct:`!struct anon_vma_name` VMA write. > + object providing a name for anonymous > + mappings, or :c:macro:`!NULL` if none > + is set or the VMA is file-backed. > + :c:member:`!swap_readahead_info` CONFIG_SWAP Metadata used by the swap mechanism mmap read. > + to perform readahead. It is not clear how writes to the field is serialized by a shared lock. It worth noting that it is atomic. > + :c:member:`!vm_policy` CONFIG_NUMA :c:type:`!mempolicy` object which mmap write, > + describes the NUMA behaviour of the VMA write. > + VMA. > + :c:member:`!numab_state` CONFIG_NUMA_BALANCING :c:type:`!vma_numab_state` object which mmap read. > + describes the current state of > + NUMA balancing in relation to this VMA. > + Updated under mmap read lock by > + :c:func:`!task_numa_work`. Again, shared lock serializing writes make zero sense. There's other mechanism in play. I believe there's some kind of scheduler logic that excludes parallel updates for the same process. But I cannot say I understand this. > + :c:member:`!vm_userfaultfd_ctx` CONFIG_USERFAULTFD Userfaultfd context wrapper object of mmap write, > + type :c:type:`!vm_userfaultfd_ctx`, VMA write. > + either of zero size if userfaultfd is > + disabled, or containing a pointer > + to an underlying > + :c:type:`!userfaultfd_ctx` object which > + describes userfaultfd metadata. > + ================================= ===================== ======================================== =============== ... > +Lock ordering > +------------- > + > +As we have multiple locks across the kernel which may or may not be taken at the > +same time as explicit mm or VMA locks, we have to be wary of lock inversion, and > +the **order** in which locks are acquired and released becomes very important. > + > +.. note:: Lock inversion occurs when two threads need to acquire multiple locks, > + but in doing so inadvertently cause a mutual deadlock. > + > + For example, consider thread 1 which holds lock A and tries to acquire lock B, > + while thread 2 holds lock B and tries to acquire lock A. > + > + Both threads are now deadlocked on each other. However, had they attempted to > + acquire locks in the same order, one would have waited for the other to > + complete its work and no deadlock would have occurred. > + > +The opening comment in `mm/rmap.c` describes in detail the required ordering of > +locks within memory management code: > + > +.. code-block:: > + > + inode->i_rwsem (while writing or truncating, not reading or faulting) > + mm->mmap_lock > + mapping->invalidate_lock (in filemap_fault) > + folio_lock > + hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share, see hugetlbfs below) > + vma_start_write > + mapping->i_mmap_rwsem > + anon_vma->rwsem > + mm->page_table_lock or pte_lock > + swap_lock (in swap_duplicate, swap_info_get) > + mmlist_lock (in mmput, drain_mmlist and others) > + mapping->private_lock (in block_dirty_folio) > + i_pages lock (widely used) > + lruvec->lru_lock (in folio_lruvec_lock_irq) > + inode->i_lock (in set_page_dirty's __mark_inode_dirty) > + bdi.wb->list_lock (in set_page_dirty's __mark_inode_dirty) > + sb_lock (within inode_lock in fs/fs-writeback.c) > + i_pages lock (widely used, in set_page_dirty, > + in arch-dependent flush_dcache_mmap_lock, > + within bdi.wb->list_lock in __sync_single_inode) > + > +Please check the current state of this comment which may have changed since the > +time of writing of this document. I think we need one canonical place for this information. Maybe it worth moving it here from rmap.c? There's more locking ordering info in filemap.c. -- Kiryl Shutsemau / Kirill A. Shutemov