Re: [Lsf-pc] [LSF/MM TOPIC] Support for 1GB THP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2016-03-01 at 16:44 -0500, Matthew Wilcox wrote:
> On Tue, Mar 01, 2016 at 11:25:41AM +0100, Jan Kara wrote:
> > On Tue 01-03-16 02:09:11, Matthew Wilcox wrote:
> > > There are a few issues around 1GB THP support that I've come up
> > > against
> > > while working on DAX support that I think may be interesting to
> > > discuss
> > > in person.
> > > 
> > >  - Do we want to add support for 1GB THP for anonymous pages? 
> > >  DAX support
> > >    is driving the initial 1GB THP support, but would anonymous
> > > VMAs also
> > >    benefit from 1GB support?  I'm not volunteering to do this
> > > work, but
> > >    it might make an interesting conversation if we can identify
> > > some users
> > >    who think performance would be better if they had 1GB THP
> > > support.
> > 
> > Some time ago I was thinking about 1GB THP and I was wondering: 
> > What is the motivation for 1GB pages for persistent memory? Is it 
> > the savings in memory used for page tables? Or is it about the cost
> > of fault?
> 
> I think it's both.  I heard from one customer who calculated that 
> with a 6TB server, mapping every page into a process would take ~24MB 
> of page tables.  Multiply that by the 50,000 processes they expect to
> run on a server of that size consumes 1.2TB of DRAM.  Using 1GB pages
> reduces that by a factor of 512, down to 2GB.

This sounds a bit implausible: for the machine not to be thrashing to
death, all the 6TB would have to be in shared memory used by all the
50k processes.  The much more likely scenario is that it's mostly
private memory mixed with a bit of shared, in which case sum(private
working set) + shared must be under 6TB for the machine not to thrash
and you likely only need mappings for the working set. Realistically
that means you only need about 50MB or so of page tables, even with our
current page size, assuming it's mostly file backed.  There might be
some optimisation done for the anonymous memory swap case, which is the
pte profligate one, but probably we shouldn't do anything until we
understand the workload profile.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux