On Mon, Apr 28, 2008 at 12:44 AM, Robert P. J. Day <rpjday@xxxxxxxxxxxxxx> wrote: > > inspired by rene's latest postings, i went back to review my memory > management and i'm confused by a comment i found in one of the MM > header files. > > it's well-known that *all* of the physical memory on a system is > represented by an array of "struct page" structures, which i can > condense to: > > struct page { > ... > void *virtual; > ... > } > > so, as i read it, if a physical page is currently mapped into the > kernel address space, that member will contain its virtual address. > OTOH, if it *isn't* mapped, that pointer will contain NULL. so far, > so good? > > also, memory allocation in the "normal" zone of the 32-bit x86 > address space should always return contiguous physical memory, but > allocation from the "highmem" zone promises to return only *virtually* > contiguous memory. again, all of that's fairly straightforward. but > here's where the glitch shows up, in this comment from > include/linux/gfp.h: > > /* > * There is only one page-allocator function, and two main namespaces to > * it. The alloc_page*() variants return 'struct page *' and as such > * can allocate highmem pages, the *get*page*() variants return > * virtual kernel addresses to the allocated page(s). > */ > The above comment is represented in the function below (two namespaces, and how get*page boils down to alloc_pages(): unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order) { struct page * page; page = alloc_pages(gfp_mask, order); if (!page) return 0; return (unsigned long) page_address(page); } And as for highmem or no highmem, that is because alloc_pages() allows a gfp_mask to be passed in. If it is not there, then naturally no way to control it. But looking at vmalloc() it seemed it always used highmem anyway: /** * vmalloc - allocate virtually contiguous memory * @size: allocation size * Allocate enough pages to cover @size from the page level * allocator and map them into contiguous kernel virtual space. * * For tight control over page level allocator and protection flags * use __vmalloc() instead. */ void *vmalloc(unsigned long size) { return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL); } EXPORT_SYMBOL(vmalloc); So there goes the big diff - i think - page_alloc() allow more flexible control, whereas vmalloc() is always highmem, on top struct page vs virtual address namespace difference. > now, if i allocate a chunk of space from the normal zone and i get > back a valid address after the allocation succeeds, it seems to me > that i can represent that space with either the returned virtual > address *or* the address of the "page" structure, right? and that's > because, as long as the space allocated is physically contiguous, then > a single "struct page *" value will represent the first struct and the > rest of the consecutive ones after that that define that space. > > however, if i allocate space from high memory (896M and up using, > say, vmalloc()), what i'll get back if that call succeeds is the > resulting virtual address ***but*** , since there's no guarantee that > those pages are contiguous, each page could be represented by some continguity ....not sure...but what I can see is tha vmalloc end up calling kmalloc(), which ends up callling alloc_pages(). > arbitrary "page" structure, no? > > so how to explain this part of the comment from above? > > "The alloc_page*() variants return 'struct page *' and as such can > allocate highmem pages, ..." > > huh? if you're allocating "highmem" pages, i don't see how you can > represent the allocated space with a single struct page pointer > (unless, of course, it's a single page, but that's not what's being > addressed here). > > so what does that comment mean? it seems to be exactly backwards > from what i'm used to believing. > Essentially, vmalloc try to conserve precious lowmem, and so for some reason lowmem is needed, alloc_pages should be used. That what I think :-). Correct me if I am wrong. Or did I completely misunderstood u? -- Regards, Peter Teoh -- To unsubscribe from this list: send an email with "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx Please read the FAQ at http://kernelnewbies.org/FAQ