Re: [rfc][patch 2/2] mm: introduce optional pte_special pte bit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 15, 2008 at 08:48:42PM -0800, Linus Torvalds wrote:
> 
> 
> On Wed, 16 Jan 2008, Nick Piggin wrote:
> > 
> > Right, that's what I had hoped as well. But when I say pte_special
> > *usable* by all architectures, I mean it is usable by all that can
> > spare a bit in the pte. Apparently ARM can't because some some bug
> > in an Xscale CPU or something (the thread is on linux-arch).
> 
> Hmm. Can you give a pointer to some browsable archive? I guess I should 
> subscribe, but there's too much email, too little time. linux-arch is one 
> of the lists that I probably should look at.

http://marc.info/?t=119968107900003&r=2&w=2

I've also cc'ed Russell and Catalin, who were involved in that one.


> That said: especially for PFNMAP and friends, it may be possible to simply 
> re-use an existign hardware bit, like (for example) a "cacheable" bit.
> 
> That doesn't mean that such an architecture would have a free bit for any 
> *arbitrary* software use, but the "no <struct page> backing" is really a 
> pretty special feature, and may well map fairly well 1:1 with something 
> like a "cache disable" bit (which I do think ARM has).
> 
> It's not like we necessarily would want /dev/mem to be mapped cacheable 
> *anyway*, much less on some architecture with stupid virtual caches. 
> 
> > I remember that too. I guess some wires got crossed somewhere. s390
> > evidently does have free bits in their pte_present-type ptes.
> 
> I think they had two types of PTE's - 32-bit and 64-bit. Maybe it's just 
> the 32-bit one that was all used up (but see above - maybe cacheable bits 
> are doable?)

Not sure, I'll let the s390 people chime in here. We'd still need a
pte_special bit for 32 and 64 bit ptes (if they are different)...

 
> I do have to say, one of the reasons I enjoyed PFNMAP was that so far 
> we've basially been able to live without any SW-specified bits at all. 
> Yeah, we use "software bits" on architectures to emulate dirty/accessed, 
> but we have never really needed any "kernel internal bits". And I do think 
> that's generally a good idea. 

I agree in a way (on one hand it would be nice to simplify the whole
PFNMAP logic, on the other hand it isn't actually a problem to keep it,
and it is a nice way of avoiding the use of a pte bit, which might be
important in future even if not today...) 


> So in that sense, I'd actually prefer the current setup if it's not a huge 
> pain. 

Well that is why I wanted to go the ifdef route (or rather, tidy it up to
be an if (CONSTANT) { } else { } or something).

We could go and hide the pte-check in s390 specific code just in the
VM_MIXEDMAP case, but again I think that is actually just making things
less clear (than having the 2 distinct cases in core code).


> I saw your 5% number, but I really wonder about that one. Was that 
> perhaps with the much more expensive non-linear NUMA "pfn_to_page()"? THAT 
> expense would drown out any vma->vm_flags costs.

The 5% case yes that was with SPARSEMEM. And yes we should get rid of
that pfn_valid test IMO (or put it under CONFIG_DEBUG_VM). I'm not aware
of it ever triggering, so I think it should go away (I think Hugh nacked
my attempt at this last time -- Hugh, what do you think?).

However, minor performance issues aside, I'd still hope to find a good
way to get the pte_special path in.

---

vm_normal_page has been seen to take nearly 5% kernel time in profiles, and
regularly appears in the top 10 or so functions in a profile. It is called
at every page fault, and for every pte in bulk copy or unmaps.

pfn_valid in particular can be quite expensive in some memory models.

Place that code under CONFIG_DEBUG_VM. I'm not aware of it catching any
problems.

Signed-off-by: Nick Piggin <npiggin@xxxxxxx>
---
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -392,16 +392,13 @@ struct page *vm_normal_page(struct vm_ar
 			return NULL;
 	}
 
-	/*
-	 * Add some anal sanity checks for now. Eventually,
-	 * we should just do "return pfn_to_page(pfn)", but
-	 * in the meantime we check that we get a valid pfn,
-	 * and that the resulting page looks ok.
-	 */
+#ifdef CONFIG_DEBUG_VM
+	/* Check that we get a valid pfn. */
 	if (unlikely(!pfn_valid(pfn))) {
 		print_bad_pte(vma, pte, addr);
 		return NULL;
 	}
+#endif
 
 	/*
 	 * NOTE! We still have PageReserved() pages in the page 
-
To unsubscribe from this list: send the line "unsubscribe linux-arch" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel]     [Kernel Newbies]     [x86 Platform Driver]     [Netdev]     [Linux Wireless]     [Netfilter]     [Bugtraq]     [Linux Filesystems]     [Yosemite Discussion]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]

  Powered by Linux