----- Original Message ----- > From: "Johannes Weiner" <hannes@xxxxxxxxxxx> > To: "Chunyu Hu" <chuhu@xxxxxxxxxx> > Cc: linux-mm@xxxxxxxxx, guro@xxxxxx, akpm@xxxxxxxxxxxxxxxxxxxx > Sent: Tuesday, December 15, 2020 9:06:38 PM > Subject: Re: Question about admin_reserve_kbytes in GUESS mode > > On Tue, Dec 15, 2020 at 06:42:39AM -0500, Chunyu Hu wrote: > > Hello experts, > > > > I find admin_reserve_kbytes is not working as documented since > > commit 8c7829b04c523cdc(mm: fix false-positive OVERCOMMIT_GUESS failures). > > > > Before that, available free pages are used to determine the allocation > > check, and admin_reserve_kbytes > > was honored in the default GUESS over_commit mode. While after that commit, > > admin_reserve_kbytes > > is not honored in default mode any more. This looks like break the sysctl > > usage? > > > > Documentation/admin-guide/sysctl/vm.rst: > > > > admin_reserve_kbytes > > ==================== > > > > The amount of free memory in the system that should be reserved for users > > with the capability cap_sys_admin. > > > > admin_reserve_kbytes defaults to min(3% of free pages, 8MB) > > > > That should provide enough for the admin to log in and kill a process, > > if necessary, under the default overcommit 'guess' mode. > > Thanks for the report. > > Can you elaborate on your usecase a bit? How you rely on the knob, and > how it's not working properly now? Thanks for the reply! We drafted a test according to the document. The step is something like: In default GUESS overcommit memory mode: Test#1: total_memory=$(grep MemTotal /proc/meminfo | awk '{print $2}') swapoff -a echo ${total_memory} > /proc/sys/vm/admin_reserve_kbytes useradd ark_test su ark_test -c 'ls -al ~' <------------------ We expect fail here, now it succeeds. Test#2 (continue with Test#1) sysctl -w vm.drop_caches=3 free_memory=$(grep MemFree /proc/meminfo | awk '{print $2}') sysctl -w vm.admin_reserve_kbytes=$((free_memory-512*1024)) # Leave 512M memory su ark_test -c 'memhog -r10 1g' <----------- consume 1G memory, we expect it fail, now it succeeds. > > It would be easy enough to this check back to the simplified formula, > I'm just having some trouble picturing how it would be useful. The thing described in document is useful, that we can config to leave some memory to cap_sys_admin user. > > The knob never (not since git, anyway) behaved the way it's documented > here. We don't track total virtual memory; instead the checks apply to > individual allocations. So it was never true that we reserve 3% of RAM > for admins. Rather, the biggest single mmap() request an admin could > do was 100% of RAM, whereas for unprivileged users it was 97% of RAM. > But you could always do two or more requests of that size in a row > anyway. It's not clear to me that this is a meaningful distinction. I need sometime to understand what it means by 'in a row', you mean parallel allocation that more than one threads do mmap() at same time? -- Regards, Chunyu Hu