Re: [QUESTION] Resizing shared mapping without clashing with others

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Dec 01, 2024 at 07:44:10PM +0100, Dmitry Dolgov wrote:
> > On Sun, Dec 01, 2024 at 11:55:37AM +0000, Matthew Wilcox wrote:
> > On Sat, Nov 30, 2024 at 05:24:13PM +0100, Dmitry Dolgov wrote:
> > > Hi,
> > >
> > > While working on PostgreSQL [1] we've stumbled upon a question regarding
> > > resizing of shared mappings without conflicting with any other possible
> > > mappings. Before making any wrong conclusions, I would love to get some
> > > consultation from kernel folks on that topic.
> > >
> > > To put it into a context, PostgreSQL uses anonymous shared memory
> > > mapping as a buffer cache for data. The mapping size is configured at
> > > the start, and could not be changed without a restart. Now, we would
> > > like to make it more flexible and allow to change it at runtime, ideally
> > > without changing already used addresses and copying stuff back and
> > > forth.
> > >
> > > The idea is to place the shared mapping at a specified address (with
> > > MAP_FIXED if needed) with a gap, then use mremap to resize it into the
> > > gap. This approach has an open question -- how to make sure there will
> > > be no other mapping created withing the same address space, where we
> > > want to expand the shared mapping? E.g. the shared mapping was created,
> > > then large memory allocation caused another mapping to be created close
> > > to it, so that expanding is not possible.
> >
> > I think there's a very straightforward answer, which is to mmap() it to
> > the larger size to begin with.  If, say, you create a file of 1GB, you
> > can mmap() the first 100GB of that file.  If you access the last 99GB of
> > the mapping, you'll get SIGBUS, but you can truncate() the file larger
> > and gain access to the new memory that way.  Does that work for you?
> >
> > Or if you're doing MAP_ANON | MAP_SHARED, just don't access the last
> > 99GB until your configuration changes.  Memory is allocated on demand,
> > so you won't be charged for it until you use it.
> 
> Right, mapping with the larger size than needed is one option we're
> considering. But there are few arguments against that:
> 
> * Folks are wary of unnecessary large shared mappings, since in the past
>   there were issues with OOM killer making unfavorable to postgres
>   decisions because of that. It might have changed over time, but to
>   confirm that will require some investigation.
> 
> * It can cause memory accounting problems. E.g. if we use hugetlb inside
>   a cgroup with reservation limits set (something like
>   hugetlb.2MB.rsvd.limit_in_bytes), then such mmap() will be counted
>   against the limit, even though the memory wasn't allocated -- meaning
>   that we claim some resource without using it.

If it does turn out to be a problem, you can use a similar trick to how
ld.so maps binaries:

mmap(NULL, 2055640, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f221a758000
mmap(0x7f221a780000, 1462272, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x28000) = 0x7f221a780000
mmap(0x7f221a8e5000, 352256, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18d000) = 0x7f221a8e5000
mmap(0x7f221a93b000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e2000) = 0x7f221a93b000
mmap(0x7f221a941000, 52696, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f221a941000

Although you wouldn't want to do consecutive mmaps, you'd want to use
mremap() with MREMAP_FIXED -- not to change new_address, but to expand
length over the initial reserving-space mapping.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux