On Thu, Dec 05, 2013 at 03:05:19PM -0500, Rik van Riel wrote: > On 12/05/2013 02:54 PM, Mel Gorman wrote: > > >I think that's a better fit and a neater fix. Thanks! I think it barriers > >more than it needs to (definite cost vs maybe cost), the flush can be > >deferred until we are definitely trying to migrate and the pte case is > >not guaranteed to be flushed before migration due to pte_mknonnuma causing > >a flush in ptep_clear_flush to be avoided later. Mashing the two patches > >together yields this. > > I think this would fix the numa migrate case. > Good. So far I have not been seeing any problems with it at least. > However, I believe the same issue is also present in > mprotect(..., PROT_NONE) vs. compaction, for programs > that trap SIGSEGV for garbage collection purposes. > I'm not 100% convinced we need to be concerned with races with mprotect(PROT_NONE) and a parallel reference to that area from userspace. I would consider it to be a buggy application if two threads were not co-ordinating the protection of a region and referencing it. I would also expect garbage collectors to be managing smart pointers and using reference counting to copy between heap generations (or similar mechanisms) instead of trapping sigsegv. Intel's architectural manual 3A covers what happens for delayed TLB invalidations in section 4.10.4.4 (in the version I'm looking at at least). The following two snippets are the most important Software developers should understand that, between the modification of a paging-structure entry and execution of the invalidation instruction recommended in Section 4.10.4.2, the processor may use translations based on either the old value or the new value of the paging- structure entry. The following items describe some of the potential consequences of delayed invalidation: o If a paging-structure entry is modified to change from 1 to 0 the P flag from 1 to 0, an access to a linear address whose translation is controlled by this entry may or may not cause a page-fault exception. o If a paging-structure entry is modified to change the R/W flag from 0 to 1, write accesses to linear addresses whose translation is controlled by this entry may or may not cause a page-fault exception. After the PROT_NONE may happen until after the deferred TLB flush. In a race with mprotect(PROT_NONE) it'll either complete the access or receive SIGSEGV signal due to failed protections but this is pretty much expected and unpredictable. I do not think the present bit gets cleared on mprotect(PROT_NONE) due to the relevant bits been #define _PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \ _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY) #define PAGE_NONE __pgprot(_PAGE_PROTNONE | _PAGE_ACCESSED) If the present bit remains then compaction should flush the TLB on the call to ptep_clear_flush as pte_accessible check is based on the present bit. So even though it is possible for a write to complete during a call to mprotect(PROT_NONE), the same is not true for compaction. > They could lose modifications done in-between when > the pte was set to PROT_NONE, and the actual TLB > flush, if compaction moves the page around in-between > those two events. > > I don't know if this is a case we need to worry about > at all, but I think the same fix would apply to that > code path, so I guess we might as well make it... I might be going "la la la la we're fine" and deluding myself but we appear to be covered here and it would be a shame to add expense to a path unnecessarily. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>