On Wed, Nov 19, 2014 at 8:58 AM, Konstantin Khlebnikov <koct9i@xxxxxxxxx> wrote: > On Wed, Nov 19, 2014 at 7:09 PM, Vlastimil Babka <vbabka@xxxxxxx> wrote: >> Also from reading http://lwn.net/Articles/383162/ I understand that correctness >> also depends on the hierarchy and I wonder if there's a danger of reintroducing >> a bug like the one described there. > > If I remember right that was fixed by linking non-exclusively mapped pages to > root anon_vma instead of anon_vma from vma where fault has happened. > After my patch this still works. Topology hierarchy actually isn't used. > Here just one selected "root' anon_vma which dies last. That's all. That's not how I remember it. An anon_vma corresponds to a given vma V, and is used to track all vmas (V and descendant vmas) that may include a page that was originally mapped in V. Each anon page has a link to the anon_vma corresponding to the vma they were originally faulted in, and an offset indicating where the page was located relative to that original VMA. The anon_vma has an interval tree of struct anon_vma_chain, and each struct anon_vma_chain includes a link to a descendent-of-V vma. This allows rmap to quickly find all the vmas that may map a given page (based on the page's anon_vma and offset). When forking or splitting vmas, the new vma is a descendent of the same vmas as the old one so it must be added to all the anon_vma interval trees that were referencing the old one (that is, ancestors of the new vma). To that end, all the struct anon_vma_chain pointing to a given vma are kept on a linked list, and struct anon_vma_chain includes a link to the anon_vma holding the interval tree. Locking the entire structure is done with a single lock hosted in the root anon_vma (that is, a vma that was created by mmap() and not by cloning or forking existing vmas). Limit the length of the ancestors linked list is correct, though it has performance implications. In the extreme case, forcing all vmas to be added on the root vma's interval tree would be correct, though it may re-introduce the performance problems that lead to the introduction of anon_vma. The good thing about Konstantin's proposal is that it does not have any magic constant like mine did. However, I think he is mistaken in saying that hierarchy isn't used - an ancestor vma will always have more descendents than its children, and the reason for the hierarchy is to limit the number of vmas that rmap must explore. -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>