On Tue, Feb 20, 2024 at 10:27 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > On 2/19/24 18:17, Suren Baghdasaryan wrote: > > On Thu, Feb 15, 2024 at 3:56 PM Kent Overstreet > > <kent.overstreet@xxxxxxxxx> wrote: > >> > >> On Thu, Feb 15, 2024 at 06:27:29PM -0500, Steven Rostedt wrote: > >> > All this, and we are still worried about 4k for useful debugging :-/ > > > > I was planning to refactor this function to print one record at a time > > with a smaller buffer but after discussing with Kent, he has plans to > > reuse this function and having the report in one buffer is needed for > > that. > > We are printing to console, AFAICS all the code involved uses plain printk() > I think it would be way easier to have a function using printk() for this > use case than the seq_buf which is more suitable for /proc and friends. Then > all concerns about buffers would be gone. It wouldn't be that much of a code > duplication? Ok, after discussing this with Kent, I'll change this patch to provide a function returning N top consumers (the array and N will be provided by the caller) and then we can print one record at a time with much less memory needed. That should address reusability concerns, will use memory more efficiently and will allow for more flexibility (more/less than 10 records if needed). Thanks for the feedback, everyone! > > >> Every additional 4k still needs justification. And whether we burn a > >> reserve on this will have no observable effect on user output in > >> remotely normal situations; if this allocation ever fails, we've already > >> been in an OOM situation for awhile and we've already printed out this > >> report many times, with less memory pressure where the allocation would > >> have succeeded. > > > > I'm not sure this claim will always be true, specifically in the case > > of low-end devices with relatively low amounts of reserves and in the > > That's right, GFP_ATOMIC failures can easily happen without prior OOMs. > Consider a system where userspace allocations fill the memory as they > usually do, up to high watermark. Then a burst of packets is received and > handled by GFP_ATOMIC allocations that deplete the reserves and can't cause > OOMs (OOM is when we fail to reclaim anything, but we are allocating from a > context that can't reclaim), so the very first report would be an GFP_ATOMIC > failure and now it can't allocate that buffer for printing. > > I'm sure more such scenarios exist, Cc: Tetsuo who I recall was an expert on > this topic. > > > presence of a possible quick memory usage spike. We should also > > consider a case when panic_on_oom is set. All we get is one OOM > > report, so we get only one chance to capture this report. In any case, > > I don't yet have data to prove or disprove this claim but it will be > > interesting to test it with data from the field once the feature is > > deployed. > > > > For now I think with Vlastimil's __GFP_NOWARN suggestion the code > > becomes safe and the only risk is to lose this report. If we get cases > > with reports missing this data, we can easily change to reserved > > memory. >