Hello, Vladimir On Fri, May 08, 2015 at 12:56:04PM +0300, Vladimir Davydov wrote: > On Mon, May 04, 2015 at 07:54:59PM +0900, Minchan Kim wrote: > > So, I guess once below compiler optimization happens in __page_set_anon_rmap, > > it could be corrupt in page_refernced. > > > > __page_set_anon_rmap: > > page->mapping = (struct address_space *) anon_vma; > > page->mapping = (struct address_space *)((void *)page_mapping + PAGE_MAPPING_ANON); > > > > Because page_referenced checks it with PageAnon which has no memory barrier. > > So if above compiler optimization happens, page_referenced can pass the anon > > page in rmap_walk_file, not ramp_walk_anon. It's my theory. :) > > FWIW > > If such splits were possible, we would have bugs all over the kernel > IMO. An example is do_wp_page() vs shrink_active_list(). In do_wp_page() > we can call page_move_anon_rmap(), which sets page->mapping in exactly > the same fashion as above-mentioned __page_set_anon_rmap(): > > anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON; > page->mapping = (struct address_space *) anon_vma; > > The page in question may be on an LRU list, because nowhere in > do_wp_page() we remove it from the list, neither do we take any LRU > related locks. The page is locked, that's true, but shrink_active_list() > calls page_referenced() on an unlocked page, so according to your logic > they can race with the latter receiving a page with page->mapping equal > to anon_vma w/o PAGE_MAPPING_ANON bit set: > > CPU0 CPU1 > ---- ---- > do_wp_page shrink_active_list > lock_page page_referenced > PageAnon->yes, so skip trylock_page > page_move_anon_rmap > page->mapping = anon_vma > rmap_walk > PageAnon->no > rmap_walk_file > BUG > page->mapping = page->mapping+PAGE_MAPPING_ANON > > However, this does not happen. Good spot. However, it doesn't mean it's right so you are okay to rely on it. Normally, store tearing is not common and such race would be hard to hit but I want to call it as BUG. Rik wrote the code and commented out. "Protected against the rmap code by the page lock" But unfortunately, page_referenced in shrink_active_list doesn't hold a page lock so isn't it a bug? Rik? Please, read store tearing section in Documentation/memory-barrier.txt. If you get confused due to aligned memory, please read this link. https://lkml.org/lkml/2014/7/16/262 Other quote from Paul in https://lkml.org/lkml/2015/5/1/229 " .. If the thing read/written does fit into a machine word and if the location read/written is properly aligned, I would be quite surprised if either READ_ONCE() or WRITE_ONCE() resulted in any sort of tearing. " I parsed it as that "even store tearing can happen machine word at alinged address and that's why WRITE_ONCE is there to prevent it" If you want to claim GCC doesn't do it, please read below links https://lkml.org/lkml/2015/4/16/527 http://yarchive.net/comp/linux/ACCESS_ONCE.html Quote from Linus " The thing is, you can't _prove_ that the compiler won't do it, especially if you end up changing the code later (without thinking about the fact that you're loading things without locking). So the rule is: if you access unlocked values, you use ACCESS_ONCE(). You don't say "but it can't matter". Because you simply don't know. " Yeb, I might be paranoid but my point is it might work now on most of arch but it seem to be buggy/fragile/subtle because we couldn't prove all arch/compiler don't make any trouble. So, intead of adding more logics based on fragile, please use right lock model. If lock becomes big trouble by overhead, let's fix it(for instance, use WRITE_ONCE for update-side and READ_ONCE for read-side) if I don't miss something. > > Thanks, > Vladimir -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html