> On Sun, Dec 01, 2024 at 11:55:37AM +0000, Matthew Wilcox wrote: > On Sat, Nov 30, 2024 at 05:24:13PM +0100, Dmitry Dolgov wrote: > > Hi, > > > > While working on PostgreSQL [1] we've stumbled upon a question regarding > > resizing of shared mappings without conflicting with any other possible > > mappings. Before making any wrong conclusions, I would love to get some > > consultation from kernel folks on that topic. > > > > To put it into a context, PostgreSQL uses anonymous shared memory > > mapping as a buffer cache for data. The mapping size is configured at > > the start, and could not be changed without a restart. Now, we would > > like to make it more flexible and allow to change it at runtime, ideally > > without changing already used addresses and copying stuff back and > > forth. > > > > The idea is to place the shared mapping at a specified address (with > > MAP_FIXED if needed) with a gap, then use mremap to resize it into the > > gap. This approach has an open question -- how to make sure there will > > be no other mapping created withing the same address space, where we > > want to expand the shared mapping? E.g. the shared mapping was created, > > then large memory allocation caused another mapping to be created close > > to it, so that expanding is not possible. > > I think there's a very straightforward answer, which is to mmap() it to > the larger size to begin with. If, say, you create a file of 1GB, you > can mmap() the first 100GB of that file. If you access the last 99GB of > the mapping, you'll get SIGBUS, but you can truncate() the file larger > and gain access to the new memory that way. Does that work for you? > > Or if you're doing MAP_ANON | MAP_SHARED, just don't access the last > 99GB until your configuration changes. Memory is allocated on demand, > so you won't be charged for it until you use it. Right, mapping with the larger size than needed is one option we're considering. But there are few arguments against that: * Folks are wary of unnecessary large shared mappings, since in the past there were issues with OOM killer making unfavorable to postgres decisions because of that. It might have changed over time, but to confirm that will require some investigation. * It can cause memory accounting problems. E.g. if we use hugetlb inside a cgroup with reservation limits set (something like hugetlb.2MB.rsvd.limit_in_bytes), then such mmap() will be counted against the limit, even though the memory wasn't allocated -- meaning that we claim some resource without using it.