On Fri, Jan 10, 2014 at 07:42:04PM +0200, Kirill A. Shutemov wrote: > On Wed, Jan 08, 2014 at 03:13:21PM +0000, Mel Gorman wrote: > > I think transparent huge pagecache is likely to crop up for more than one > > reason. There is the TLB issue and the motivation that i-TLB pressure is > > a problem in some specialised cases. Whatever the merits of that case, > > transparent hugepage cache has been raised as a potential solution for > > some VM scalability problems. I recognise that dealing with large numbers > > of struct pages is now a problem on larger machines (although I have not > > seen quantified data on the problem nor do I have access to a machine large > > enough to measure it myself) but I'm wary of transparent hugepage cache > > being treated as a primary solution for VM scalability problems. Lacking > > performance data I have no suggestions on what these alternative solutions > > might look like. Something I'd like to see discussed (but don't have the MM chops to lead a discussion on myself) is the PAGE_CACHE_SIZE vs PAGE_SIZE split. This needs to be either fixed or removed, IMO. It's been in the tree since before git history began (ie before 2005), it imposes a reasonably large cognitive burden on programmers ("what kind of page size do I want here?"), it's not intuitively obvious (to a non-mm person) which page size is which, and it's never actually bought us anything because it's always been the same! Also, it bitrots. Look at this: pgoff_t pgoff = (((address & PAGE_MASK) - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; vmf.pgoff = pgoff; pgoff_t offset = vmf->pgoff; size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; if (offset >= size) return VM_FAULT_SIGBUS; That's spread over three functions, but that goes to illustrate my point; getting this stuff right is Hard; core mm developers get it wrong, we don't have the right types to document whether a variable is in PAGE_SIZE or PAGE_CACHE_SIZE units, and we're not getting any benefit from it today. > Sibling topic is THP for XIP (see Matthew's patchset). Guys want to manage > persistent memory in 2M chunks where it's possible. And THP (but without > struct page in this case) is the obvious solution. Not just 2MB, we also want 1GB pages for some special cases. It looks doable (XFS can allocate aligned 1GB blocks). I've written some supporting code that will at least get us to the point where we can insert a 1GB page. I haven't been able to test anything yet. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>