On 2/1/21 12:26 PM, Steven Noonan wrote: > (Please CC me on replies, I'm not subscribed to the list.) > > I started seeing this problem several months ago, but after some Google searches I concluded that someone had already root caused and fixed it, and I just needed to wait for a newer kernel to come along: > > https://www.spinics.net/lists/linux-mm/msg226311.html Yeah, that thread mentions Josef's series [1], but I don't see that it made it into git log. What happened to it? Meanwhile Matthew refactored it with "4bd6a7353ee1 ("sysctl: Convert to iter interfaces")" which left the kzalloc() but changed count to count+1, which would explain the change from order-5 to order-6. In my quick test, strace cat tells me it uses 128k sized read, which is order-5. So miminally there should be kvzalloc(), but it's still somewhat unfortunate to do that for reading a few bytes. Also the check for KMALLOC_MAX_SIZE happens before the +1. Meh. Matthew? [1] https://lore.kernel.org/linux-fsdevel/20200813210411.905010-2-josef@xxxxxxxxxxxxxx/ > But it turns out this has not yet been fixed, even in 5.10.10. Also, while it used to be logged as an order:5 allocation it's now an order:6 allocation for some reason. It only happens rarely (anywhere from once per day to once per week), but it's kind of scary to check the kernel log and see this: > > [569537.153097] cat: page allocation failure: order:6, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0 > [569537.153104] CPU: 1 PID: 3153417 Comm: cat Tainted: P S IO 5.10.10-1-hsw #1 > [569537.153105] Hardware name: ASUSTeK COMPUTER INC. P10S-M WS Series/P10S-M WS Series, BIOS 4503 07/23/2019 > [569537.153106] Call Trace: > [569537.153111] dump_stack+0x58/0x6b > [569537.153113] warn_alloc.cold+0x78/0xdc > [569537.153116] ? __alloc_pages_direct_compact+0x15f/0x170 > [569537.153117] __alloc_pages_slowpath.constprop.0+0xc4e/0xc70 > [569537.153120] __alloc_pages_nodemask+0x2de/0x310 > [569537.153121] kmalloc_order+0x25/0x80 > [569537.153122] kmalloc_order_trace+0x17/0x80 > [569537.153125] proc_sys_call_handler+0xb8/0x250 > [569537.153127] new_sync_read+0x10f/0x190 > [569537.153128] vfs_read+0x147/0x1a0 > [569537.153130] ksys_read+0x5f/0xe0 > [569537.153131] do_syscall_64+0x2d/0x70 > [569537.153134] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [569537.153135] RIP: 0033:0x7fcc21ef92b2 > [569537.153137] Code: c0 e9 b2 fe ff ff 50 48 8d 3d d2 2a 0a 00 e8 e5 ec 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 > [569537.153138] RSP: 002b:00007ffcfbcc5c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 > [569537.153140] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fcc21ef92b2 > [569537.153140] RDX: 0000000000020000 RSI: 00007fcc21ff8000 RDI: 0000000000000003 > [569537.153141] RBP: 00007fcc21ff8000 R08: 00007fcc21ff7010 R09: 0000000000000000 > [569537.153142] R10: 0000000000000022 R11: 0000000000000246 R12: 0000556e99b70c00 > [569537.153142] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 > [569537.153144] Mem-Info: > [569537.153146] active_anon:50924 inactive_anon:1715988 isolated_anon:0 > active_file:163427 inactive_file:104041 isolated_file:0 > unevictable:3247 dirty:5825 writeback:0 > slab_reclaimable:39566 slab_unreclaimable:1760175 > mapped:49406 shmem:17850 pagetables:10611 bounce:0 > free:1060578 free_pcp:0 free_cma:0 > [569537.153149] Node 0 active_anon:203696kB inactive_anon:6863952kB active_file:653708kB inactive_file:416164kB unevictable:12988kB isolated(anon):0kB isolated(file):0kB mapped:197624kB dirty:23300kB writeback:0kB shmem:71400kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:18112kB all_unreclaimable? no > [569537.153151] Node 0 DMA free:11800kB min:24kB low:36kB high:48kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15896kB mlocked:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > [569537.153153] lowmem_reserve[]: 0 1810 64139 64139 64139 > [569537.153156] Node 0 DMA32 free:252476kB min:3208kB low:5076kB high:6944kB reserved_highatomic:0KB active_anon:756kB inactive_anon:124844kB active_file:2120kB inactive_file:180kB unevictable:0kB writepending:0kB present:1937436kB managed:1871900kB mlocked:0kB pagetables:8kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > [569537.153158] lowmem_reserve[]: 0 0 62329 62329 62329 > [569537.153160] Node 0 Normal free:3978036kB min:109404kB low:173232kB high:237060kB reserved_highatomic:2048KB active_anon:202940kB inactive_anon:6739108kB active_file:651588kB inactive_file:415984kB unevictable:12988kB writepending:23076kB present:65011712kB managed:63831896kB mlocked:11176kB pagetables:42436kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > [569537.153162] lowmem_reserve[]: 0 0 0 0 0 > [569537.153164] Node 0 DMA: 2*4kB (U) 2*8kB (U) 2*16kB (U) 3*32kB (U) 2*64kB (U) 2*128kB (U) 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11800kB > [569537.153171] Node 0 DMA32: 3738*4kB (UME) 2341*8kB (UME) 1719*16kB (UME) 943*32kB (UME) 443*64kB (UME) 19*128kB (UME) 108*256kB (UME) 169*512kB (UM) 8*1024kB (UM) 4*2048kB (UM) 0*4096kB = 252704kB > [569537.153179] Node 0 Normal: 81071*4kB (UMEH) 112101*8kB (UMEH) 101108*16kB (UMEH) 34297*32kB (UMEH) 660*64kB (UME) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3978564kB > [569537.153185] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB > [569537.153186] Node 0 hugepages_total=64 hugepages_free=51 hugepages_surp=64 hugepages_size=2048kB > [569537.153186] 298118 total pagecache pages > [569537.153197] 14242 pages in swap cache > [569537.153198] Swap cache stats: add 165298, delete 151079, find 10932118/10944726 > [569537.153199] Free swap = 100129528kB > [569537.153199] Total swap = 100663288kB > [569537.153200] 16741283 pages RAM > [569537.153201] 0 pages HighMem/MovableOnly > [569537.153201] 311360 pages reserved > [569537.153202] 0 pages hwpoisoned > > I used sysdig to figure out what 'cat' was trying to read when it failed, via: > > # sysdig -p "%12user.name %6proc.pid %12proc.name %fd.name %evt.args" 'evt.type=read and evt.failed=true and proc.name=cat' > > The relevant one was this: > > nobody 315341 cat /proc/sys/kernel/random/entropy_avail res=-12(ENOMEM) data= > > Which I believe is a munin script gathering a bunch of data for graphs (munin-node runs as "nobody", uses shell scripts for its data gathering, and one of the plugins tracks entropy_avail, so that would seem to be the guilty party). > > Any idea why it's trying to do an order:6 allocation for just a simple read of entropy_avail? Based on the stack trace I'm guessing this isn't owned by the "random" driver maintainers, it seems to be a more general issue with memory allocations while reading from procfs. >