On 7 Feb 2017, at 7:55, Aneesh Kumar K.V wrote: > "Kirill A. Shutemov" <kirill@xxxxxxxxxxxxx> writes: > >> On Sun, Feb 05, 2017 at 11:12:41AM -0500, Zi Yan wrote: >>> From: Zi Yan <ziy@xxxxxxxxxx> >>> >>> Originally, zap_pmd_range() checks pmd value without taking pmd lock. >>> This can cause pmd_protnone entry not being freed. >>> >>> Because there are two steps in changing a pmd entry to a pmd_protnone >>> entry. First, the pmd entry is cleared to a pmd_none entry, then, >>> the pmd_none entry is changed into a pmd_protnone entry. >>> The racy check, even with barrier, might only see the pmd_none entry >>> in zap_pmd_range(), thus, the mapping is neither split nor zapped. >> >> That's definately a good catch. >> >> But I don't agree with the solution. Taking pmd lock on each >> zap_pmd_range() is a significant hit by scalability of the code path. >> Yes, split ptl lock helps, but it would be nice to avoid the lock in first >> place. >> >> Can we fix change_huge_pmd() instead? Is there a reason why we cannot >> setup the pmd_protnone() atomically? >> >> Mel? Rik? >> > > I am also trying to fixup the usage of set_pte_at on ptes that are > valid/present (that this autonuma ptes). I guess what we are missing is a > variant of pte update routines that can atomically update a pte without > clearing it and that also doesn't do a tlb flush ? I think so. The key point is to have a atomic PTE update function instead of current two-step pte/pmd_get_clear() then set_pte/pmd_at(). We can always add a wrapper to include TLB flush, once we have this atomic update function. I used xchg() to replace xxx_get_clear() & set_xxx_at() in pmd_protnone(), set_pmd_migration_entry(), and remove_pmd_migration(), then ran my test overnight. I did not see kernel crashing nor data corruption. So I think the atomic PTE/PMD update function works without taking locks in zap_pmd_range(). Aneesh, in your patch of fixing PowerPC's autonuma pte problem, why didn't you use atomic operations? Is there any limitation on PowerPC? My question is why current kernel uses xxx_get_clear() and set_xxx_at() in the first place? Is there any limitation I do not know? Thanks. -- Best Regards Yan Zi
Attachment:
signature.asc
Description: OpenPGP digital signature