On Tue, Jan 17, 2023 at 01:38:24PM -0800, James Houghton wrote: > > > + if (curr < end) { > > > + /* Don't hold the VMA lock for too long. */ > > > + hugetlb_vma_unlock_write(vma); > > > + cond_resched(); > > > + hugetlb_vma_lock_write(vma); > > > > The intention is good here but IIUC this will cause vma lock to be taken > > after the i_mmap_rwsem, which can cause circular deadlocks. If to do this > > properly we'll need to also release the i_mmap_rwsem. > > Sorry if you spent a long time debugging this! I sent a reply a week > ago about this too. Oops, yes, I somehow missed that one. No worry - it's reported by lockdep. :) > > > > > However it may make the resched() logic over complicated, meanwhile for 2M > > huge pages I think this will be called for each 2M range which can be too > > fine grained, so it looks like the "cur < end" check is a bit too aggresive. > > > > The other thing is I noticed that the long period of mmu notifier > > invalidate between start -> end will (in reallife VM context) causing vcpu > > threads spinning. > > > > I _think_ it's because is_page_fault_stale() (when during a vmexit > > following a kvm page fault) always reports true during the long procedure > > of MADV_COLLAPSE if to be called upon a large range, so even if we release > > both locks here it may not tremedously on the VM migration use case because > > of the long-standing mmu notifier invalidation procedure. > > Oh... indeed. Thanks for pointing that out. > > > > > To summarize.. I think a simpler start version of hugetlb MADV_COLLAPSE can > > drop this "if" block, and let the userapp decide the step size of COLLAPSE? > > I'll drop this resched logic. Thanks Peter. Sounds good, thanks. -- Peter Xu