On Sun, 12 Nov 2023 19:51:14 +0000 SeongJae Park <sj@xxxxxxxxxx> wrote: > Hello, > > > I'd like to share an idea for making systems automatically scale up/down memory > in an access/contiguity-awared way. It is designed for memory efficiency of > free pages reporting-like collaboration based memory oversubscribed virtual > machine systems, but it might also be potentially useful for memory/power > efficiency and memory contiguity of general systems. There is no > implementation at the moment, but I'd like to hear any comments or concerns > about the idea first if anyone has. I will also share this in the upcoming > kernel summit DAMON talk[1]'s future plans part. > [...] > > ACMA: Access/Contiguity-aware Memory Auto-scaling > ================================================= > > We therefore propose a new kernel feature for the requirements, namely > Access/Contiguity-aware Memory Auto-scaling (ACMA). > > Definitions > ----------- > > ACMA defines a metric called DAMON-detected working set. This is a set of > memory regions that DAMON has detected access to those regions within a > user-specifiable time interval, say, one minute. > > ACMA also defines a new operation called stealing. It receives a contiguous > memory region as its input, and allocates the pages of the region. If some > pages in the region are not free, migrate those out. Hence it could be thought > of a variant, or a wrapper of memory offlining or alloc_contig_range(). If the > allocation is successful, it further reports the region as safe to use to the > host. ACMA manages the stealing status of each memory block. If the entire > page of a memory block is stolen, it further hot-unplug the block. > > It further defines a new operation called stolen pages returning. The action > receives an amount of memory size as input. If there are not-yet-hot-unplugged > stolen pages of the size, it frees the page. If there are no such stolen pages > but a hot-unplugged stolen memory block, it hot-plugs the block again, closer > to the not-hot-unplugged blocks first. Then the guest users can allocate pages > of returned ones and access it. When they access it, the host will notify that > via page fault and assign/map a host-physical page for that. > > Workflow > -------- > > With these definitions, ACMA behaves based on system status as follows. > > Phase 0. It periodically monitors the DAMON-based working set size and free > memory size of the system. > > Phase 1. If the free memory to the working set size ratio is more than a > threshold (high), say, 2:1 (200%), ACMA steals report-granularity contiguous > non-working set pages in the last not-yet-hot-unplugged memory block, colder > pages first. The ratio will decrease. > > Phase 2. If the free memory to the working set size ratio becomes less than a > threshold (normal), say, 1:1 (100%), ACMA stops stealing and start reclaiming > non-workingset pages, colder pages first. The ratio will increase. The > reclamation is continued until the ratio becomes higher than the normal > threshold. > > Phase 3. If the non-workingset reclamation is not increasing the ratio and it > becomes less than yet another threshold (low), say, 1:2 (50%), ACMA starts > returning stolen pages until the free memory to the working set ratio becomes > higher than the low threshold. So, the idea is to keep only specific portion of working set as free. However, the free memory to the working set size ratio is not easy to understand since it changes very dynamically, based on the access pattern. Hence, imagining how it will works and what results the system will get without visualization or detailed example scenario is not easy. This would be much more challenging for users. The three thresholds may also be hard to be optimally tuned, especially when the characteristic of the workload is dynamic. Since we have user/self-feedback-driven auto-tuning, I believe we could make this more simple. Specifically, ACMA could ask user to set min/max memory of the system to guarantee, and acceptable level of memory pressure. Then, it could do its best to make the system memory efficient while keeping the three conditions. Detailed mechanism will of course more complicated then this simple statement, but I believe this simple statement is letting users understand what the result of using ACMA is. I will share more detailed specification of the updated idea as another "RFC IDEA" mail soon. Thanks, SJ [...]