Re: [Lsf-pc] [LSF/MM ATTEND] Memory management -- THP, hugetlb, scalability

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 10, 2014 at 07:42:04PM +0200, Kirill A. Shutemov wrote:
> On Wed, Jan 08, 2014 at 03:13:21PM +0000, Mel Gorman wrote:
> > I think transparent huge pagecache is likely to crop up for more than one
> > reason. There is the TLB issue and the motivation that i-TLB pressure is
> > a problem in some specialised cases. Whatever the merits of that case,
> > transparent hugepage cache has been raised as a potential solution for
> > some VM scalability problems. I recognise that dealing with large numbers
> > of struct pages is now a problem on larger machines (although I have not
> > seen quantified data on the problem nor do I have access to a machine large
> > enough to measure it myself) but I'm wary of transparent hugepage cache
> > being treated as a primary solution for VM scalability problems. Lacking
> > performance data I have no suggestions on what these alternative solutions
> > might look like.

Something I'd like to see discussed (but don't have the MM chops to
lead a discussion on myself) is the PAGE_CACHE_SIZE vs PAGE_SIZE split.
This needs to be either fixed or removed, IMO.  It's been in the tree
since before git history began (ie before 2005), it imposes a reasonably
large cognitive burden on programmers ("what kind of page size do I want
here?"), it's not intuitively obvious (to a non-mm person) which page
size is which, and it's never actually bought us anything because it's
always been the same!

Also, it bitrots.  Look at this:

        pgoff_t pgoff = (((address & PAGE_MASK)
                        - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
        vmf.pgoff = pgoff;
        pgoff_t offset = vmf->pgoff;
        size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
        if (offset >= size)
                return VM_FAULT_SIGBUS;

That's spread over three functions, but that goes to illustrate my point;
getting this stuff right is Hard; core mm developers get it wrong, we
don't have the right types to document whether a variable is in PAGE_SIZE
or PAGE_CACHE_SIZE units, and we're not getting any benefit from it today.

> Sibling topic is THP for XIP (see Matthew's patchset). Guys want to manage
> persistent memory in 2M chunks where it's possible. And THP (but without
> struct page in this case) is the obvious solution.

Not just 2MB, we also want 1GB pages for some special cases.  It looks
doable (XFS can allocate aligned 1GB blocks).  I've written some
supporting code that will at least get us to the point where we can
insert a 1GB page.  I haven't been able to test anything yet.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux