On Sun, 2008-01-13 at 08:50 -0800, Linus Torvalds wrote: > > Well the immediate improvement from this actual patch is just that > > it gives better and smaller code for vm_normal_page (even if you > > discount the debug checks in the existing code). > > It does no such thing, except for the slow-path that doesn't matter. > > So it may optimize the slow-path (PFNMAP/MIXMAP), but the real path stays > exactly the same from a code generation standpoint (just checking a > different bit, afaik). > > If those pte_special bits are required for unexplained lockless > get_user_pages, is this going to get EVEN WORSE when s390 (again) cannot > do it? Neiter is the pte_special bit required for s390 nor can't we implement pfn_valid in a way that would work with the new VM_MIXMAP vmas and copy on write. It would be slow though because DCSS segments on s390 can have different types. For one type the pages are reference counted (hotplug memory via DCSS), for the other the pages are not reference counted (read only xip DCSS). I doubt that we will stay alone with the problem, with KVM you can easily imagine to introduce hot memory add by mapping an anonymous piece of memory. For s390 the straight forward solution for pages with a pfn > max_pfn is to walk the list of all DCSS segments. For a system where /usr lives on a xip DCSS this happens frequently. It seems reasonable to me to introduce a pte bit to decide between the two cases, in particular since Nick has some other use for the bit as well (don't know too much about that features as well). When a non- reference counting pte is established we know it is special, we just have forgotten about it in vm_normal_page. What makes this ugly is the fact that there currently are some architectures like arm that do not have room for the pte_special bit in the pte. Seems like we need a clean abstraction to allow each architecture to choose the best way to make the decision between reference counted or not. It is only two arch calls, one when a pte is created for a non-refcounting page and another for the check in vm_normal_page to get the information back. The default implementation would be a nop for the first call and a pfn_valid check for the second call. For s390 I would prefer a pte bit if I can get it. If not then we have to play games with pfn_valid. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. - To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html