Paul Mackerras <paulus@xxxxxxxxx> writes: > On Thu, Feb 21, 2013 at 10:17:12PM +0530, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx> >> >> We now have PTE page consuming only 2K of the 64K page.This is in order to > > In fact the PTE page together with the hash table indexes occupies 4k, > doesn't it? The comments in the code are similarly confusing since > they talk about 2k but actually allocate 4k. > >> facilitate transparent huge page support, which works much better if our PMDs >> cover 16MB instead of 256MB. >> >> Inorder to reduce the wastage, we now have multiple PTE page fragment > ^ In order (two words) > >> from the same PTE page. > > A patch like this needs a more complete description and explanation > than you have given. For instance, you could mention that the code > that you're adding for the 32-bit and non-64k cases are just copies of > the previously generic code from pgalloc.h (actually, this movement > might be something that could be split out as a separate patch). > Also, you should describe in outline how you keep a list of pages that > aren't fully allocated and have a bitmap of which 4k sections are in > use, and also how your scheme interacts with RCU. will do > > [snip] > >> +#ifdef CONFIG_PPC_64K_PAGES >> +/* >> + * we support 15 fragments per PTE page. This is limited by how many > > Why only 15? Don't we get 16 fragments per page? > That was one of the details I wanted to come back and closely look at before posting. But missed that in the excitement of getting this all working :). ._mapcount is a signed value and hence I was not sure whether setting the top bit have any impact on how we deal with the page in other part of VM. >> + * bits we can pack in page->_mapcount. We use the first half for >> + * tracking the usage for rcu page table free. > > What does "first" mean? The high half or the low half? > high half. >> +unsigned long *page_table_alloc(struct mm_struct *mm, unsigned long vmaddr) >> +{ >> + struct page *page; >> + unsigned int mask, bit; >> + unsigned long *table; >> + >> + /* Allocate fragments of a 4K page as 1K/2K page table */ > > A 4k page? Do you mean a 64k page? And what is 1K to do with > anything? > That should be completely dropped, Cut-paste from s390 code :) >> +#ifdef CONFIG_SMP >> +static void __page_table_free_rcu(void *table) >> +{ >> + unsigned int bit; >> + struct page *page; >> + /* >> + * this is a PTE page free 2K page table >> + * fragment of a 64K page. >> + */ >> + page = virt_to_page(table); >> + bit = 1 << ((__pa(table) & ~PAGE_MASK) / PTE_FRAG_SIZE); >> + bit <<= FRAG_MASK_BITS; >> + /* >> + * clear the higher half and if nobody used the page in >> + * between, even lower half would be zero. >> + */ >> + if (atomic_xor_bits(&page->_mapcount, bit) == 0) { >> + pgtable_page_dtor(page); >> + atomic_set(&page->_mapcount, -1); >> + __free_page(page); >> + } >> +} >> + >> +static void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table) >> +{ >> + struct page *page; >> + struct mm_struct *mm; >> + unsigned int bit, mask; >> + >> + mm = tlb->mm; >> + /* Free 2K page table fragment of a 64K page */ >> + page = virt_to_page(table); >> + bit = 1 << ((__pa(table) & ~PAGE_MASK) / PTE_FRAG_SIZE); >> + spin_lock(&mm->page_table_lock); >> + /* >> + * stash the actual mask in higher half, and clear the lower half >> + * and selectively, add remove from pgtable list >> + */ >> + mask = atomic_xor_bits(&page->_mapcount, bit | (bit << FRAG_MASK_BITS)); >> + if (!(mask & FRAG_MASK)) >> + list_del(&page->lru); >> + else { >> + /* >> + * Add the page table page to pgtable_list so that >> + * the free fragment can be used by the next alloc >> + */ >> + list_del_init(&page->lru); >> + list_add_tail(&page->lru, &mm->context.pgtable_list); >> + } >> + spin_unlock(&mm->page_table_lock); >> + tlb_remove_table(tlb, table); >> +} -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>