Hi, On 14/09/2017 02:31, Sergey Senozhatsky wrote: > Hi, > > On (09/13/17 18:56), Laurent Dufour wrote: >> Hi Sergey, >> >> On 13/09/2017 13:53, Sergey Senozhatsky wrote: >>> Hi, >>> >>> On (09/08/17 20:06), Laurent Dufour wrote: > [..] >>> ok, so what I got on my box is: >>> >>> vm_munmap() -> down_write_killable(&mm->mmap_sem) >>> do_munmap() >>> __split_vma() >>> __vma_adjust() -> write_seqcount_begin(&vma->vm_sequence) >>> -> write_seqcount_begin_nested(&next->vm_sequence, SINGLE_DEPTH_NESTING) >>> >>> so this gives 3 dependencies ->mmap_sem -> ->vm_seq >>> ->vm_seq -> ->vm_seq/1 >>> ->mmap_sem -> ->vm_seq/1 >>> >>> >>> SyS_mremap() -> down_write_killable(¤t->mm->mmap_sem) >>> move_vma() -> write_seqcount_begin(&vma->vm_sequence) >>> -> write_seqcount_begin_nested(&new_vma->vm_sequence, SINGLE_DEPTH_NESTING); >>> move_page_tables() >>> __pte_alloc() >>> pte_alloc_one() >>> __alloc_pages_nodemask() >>> fs_reclaim_acquire() >>> >>> >>> I think here we have prepare_alloc_pages() call, that does >>> >>> -> fs_reclaim_acquire(gfp_mask) >>> -> fs_reclaim_release(gfp_mask) >>> >>> so that adds one more dependency ->mmap_sem -> ->vm_seq -> fs_reclaim >>> ->mmap_sem -> ->vm_seq/1 -> fs_reclaim >>> >>> >>> now, under memory pressure we hit the slow path and perform direct >>> reclaim. direct reclaim is done under fs_reclaim lock, so we end up >>> with the following call chain >>> >>> __alloc_pages_nodemask() >>> __alloc_pages_slowpath() >>> __perform_reclaim() -> fs_reclaim_acquire(gfp_mask); >>> try_to_free_pages() >>> shrink_node() >>> shrink_active_list() >>> rmap_walk_file() -> i_mmap_lock_read(mapping); >>> >>> >>> and this break the existing dependency. since we now take the leaf lock >>> (fs_reclaim) first and the the root lock (->mmap_sem). >> >> Thanks for looking at this. >> I'm sorry, I should have miss something. > > no prob :) > > >> My understanding is that there are 2 chains of locks: >> 1. from __vma_adjust() mmap_sem -> i_mmap_rwsem -> vm_seq >> 2. from move_vmap() mmap_sem -> vm_seq -> fs_reclaim >> 2. from __alloc_pages_nodemask() fs_reclaim -> i_mmap_rwsem > > yes, as far as lockdep warning suggests. > >> So the solution would be to have in __vma_adjust() >> mmap_sem -> vm_seq -> i_mmap_rwsem >> >> But this will raised the following dependency from unmap_mapping_range() >> unmap_mapping_range() -> i_mmap_rwsem >> unmap_mapping_range_tree() >> unmap_mapping_range_vma() >> zap_page_range_single() >> unmap_single_vma() >> unmap_page_range() -> vm_seq >> >> And there is no way to get rid of it easily as in unmap_mapping_range() >> there is no VMA identified yet. >> >> That's being said I can't see any clear way to get lock dependency cleaned >> here. >> Furthermore, this is not clear to me how a deadlock could happen as vm_seq >> is a sequence lock, and there is no way to get blocked here. > > as far as I understand, > seq locks can deadlock, technically. not on the write() side, but on > the read() side: > > read_seqcount_begin() > raw_read_seqcount_begin() > __read_seqcount_begin() > > and __read_seqcount_begin() spins for ever > > __read_seqcount_begin() > { > repeat: > ret = READ_ONCE(s->sequence); > if (unlikely(ret & 1)) { > cpu_relax(); > goto repeat; > } > return ret; > } > > > so if there are two CPUs, one doing write_seqcount() and the other one > doing read_seqcount() then what can happen is something like this > > CPU0 CPU1 > > fs_reclaim_acquire() > write_seqcount_begin() > fs_reclaim_acquire() read_seqcount_begin() > write_seqcount_end() > > CPU0 can't write_seqcount_end() because of fs_reclaim_acquire() from > CPU1, CPU1 can't read_seqcount_begin() because CPU0 did write_seqcount_begin() > and now waits for fs_reclaim_acquire(). makes sense? Yes, this makes sense. But in the case of this series, there is no call to __read_seqcount_begin(), and the reader (the speculative page fault handler), is just checking for (vm_seq & 1) and if this is true, simply exit the speculative path without waiting. So there is no deadlock possibility. The bad case would be to have 2 concurrent threads calling write_seqcount_begin() on the same VMA, leading a wrongly freed sequence lock but this can't happen because of the mmap_sem holding for write in such a case. Cheers, Laurent. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>