On Wed, Jun 14, 2017 at 05:31:31PM +0200, Andrea Arcangeli wrote: > Hello, > > On Wed, Jun 14, 2017 at 04:18:57PM +0200, Martin Schwidefsky wrote: > > Could we change pmdp_invalidate to make it return the old pmd entry? > > That to me seems the simplest fix to avoid losing the dirty bit. > > I earlier suggested to replace pmdp_invalidate with something like > old_pmd = pmdp_establish(pmd_mknotpresent(pmd)) (then tlb flush could > then be conditional to the old pmd being present). Making > pmdp_invalidate return the old pmd entry would be mostly equivalent to > that. > > The advantage of not changing pmdp_invalidate is that we could skip a > xchg which is more costly in __split_huge_pmd_locked and > madvise_free_huge_pmd so perhaps there's a point to keep a variant of > pmdp_invalidate that doesn't use xchg internally (and in turn can't > return the old pmd value atomically). > > If we don't want new messy names like pmdp_establish we could have a > __pmdp_invalidate that returns void, and pmdp_invalidate that returns > the old pmd and uses xchg (and it'd also be backwards compatible as > far as the callers are concerned). So those places that don't need the > old value returned and can skip the xchg, could simply > s/pmdp_invalidate/__pmdp_invalidate/ to optimize. We have few pmdp_invalidate() callers: - clear_soft_dirty_pmd(); - madvise_free_huge_pmd(); - change_huge_pmd(); - __split_huge_pmd_locked(); Only madvise_free_huge_pmd() doesn't care about old pmd. __split_huge_pmd_locked() actually needs to check dirty after pmdp_invalidate(), see patch 3/3 of the patchset. I don't think it worth introduce one more primitive only for madvise_free_huge_pmd(). I'll stick with single pmdp_invalidate() that returns old value. -- Kirill A. Shutemov