On Sun, Dec 01, 2024 at 07:44:10PM +0100, Dmitry Dolgov wrote: > > On Sun, Dec 01, 2024 at 11:55:37AM +0000, Matthew Wilcox wrote: > > On Sat, Nov 30, 2024 at 05:24:13PM +0100, Dmitry Dolgov wrote: > > > Hi, > > > > > > While working on PostgreSQL [1] we've stumbled upon a question regarding > > > resizing of shared mappings without conflicting with any other possible > > > mappings. Before making any wrong conclusions, I would love to get some > > > consultation from kernel folks on that topic. > > > > > > To put it into a context, PostgreSQL uses anonymous shared memory > > > mapping as a buffer cache for data. The mapping size is configured at > > > the start, and could not be changed without a restart. Now, we would > > > like to make it more flexible and allow to change it at runtime, ideally > > > without changing already used addresses and copying stuff back and > > > forth. > > > > > > The idea is to place the shared mapping at a specified address (with > > > MAP_FIXED if needed) with a gap, then use mremap to resize it into the > > > gap. This approach has an open question -- how to make sure there will > > > be no other mapping created withing the same address space, where we > > > want to expand the shared mapping? E.g. the shared mapping was created, > > > then large memory allocation caused another mapping to be created close > > > to it, so that expanding is not possible. > > > > I think there's a very straightforward answer, which is to mmap() it to > > the larger size to begin with. If, say, you create a file of 1GB, you > > can mmap() the first 100GB of that file. If you access the last 99GB of > > the mapping, you'll get SIGBUS, but you can truncate() the file larger > > and gain access to the new memory that way. Does that work for you? > > > > Or if you're doing MAP_ANON | MAP_SHARED, just don't access the last > > 99GB until your configuration changes. Memory is allocated on demand, > > so you won't be charged for it until you use it. > > Right, mapping with the larger size than needed is one option we're > considering. But there are few arguments against that: > > * Folks are wary of unnecessary large shared mappings, since in the past > there were issues with OOM killer making unfavorable to postgres > decisions because of that. It might have changed over time, but to > confirm that will require some investigation. > > * It can cause memory accounting problems. E.g. if we use hugetlb inside > a cgroup with reservation limits set (something like > hugetlb.2MB.rsvd.limit_in_bytes), then such mmap() will be counted > against the limit, even though the memory wasn't allocated -- meaning > that we claim some resource without using it. If it does turn out to be a problem, you can use a similar trick to how ld.so maps binaries: mmap(NULL, 2055640, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f221a758000 mmap(0x7f221a780000, 1462272, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x28000) = 0x7f221a780000 mmap(0x7f221a8e5000, 352256, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18d000) = 0x7f221a8e5000 mmap(0x7f221a93b000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e2000) = 0x7f221a93b000 mmap(0x7f221a941000, 52696, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f221a941000 Although you wouldn't want to do consecutive mmaps, you'd want to use mremap() with MREMAP_FIXED -- not to change new_address, but to expand length over the initial reserving-space mapping.