> On Dec 23, 2019, at 6:33 AM, Miles Chen <miles.chen@xxxxxxxxxxxx> wrote: > > Motivation: > ----------- > > When debug with a OOM kernel panic, it is difficult to know the > memory allocated by kernel drivers of vmalloc() by checking the > Mem-Info or Node/Zone info. For example: > > Mem-Info: > active_anon:5144 inactive_anon:16120 isolated_anon:0 > active_file:0 inactive_file:0 isolated_file:0 > unevictable:0 dirty:0 writeback:0 unstable:0 > slab_reclaimable:739 slab_unreclaimable:442469 > mapped:534 shmem:21050 pagetables:21 bounce:0 > free:14808 free_pcp:3389 free_cma:8128 > > Node 0 active_anon:20576kB inactive_anon:64480kB active_file:0kB > inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB > mapped:2136kB dirty:0kB writeback:0kB shmem:84200kB shmem_thp: 0kB > shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB > all_unr eclaimable? yes > > Node 0 DMA free:14476kB min:21512kB low:26888kB high:32264kB > reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB > active_file: 0kB inactive_file:0kB unevictable:0kB writepending:0kB > present:1048576kB managed:952736kB mlocked:0kB kernel_stack:0kB > pagetables:0kB bounce:0kB free_pcp:2716kB local_pcp:0kB free_cma:0kB > > The information above tells us the memory usage of the known memory > categories and we can check the abnormal large numbers. However, if a > memory leakage cannot be observed in the categories above, we need to > reproduce the issue with CONFIG_PAGE_OWNER. > > It is possible to read the page owner information from coredump files. > However, coredump files may not always be available, so my approach is > to print out the largest page consumer when OOM kernel panic occurs. Many of those patches helping debugging special cases had been shot down in the past. I don’t see much difference this time. If you worry about memory leak, enable kmemleak and then to reproduce. Otherwise, we will end up with too many heuristics just for debugging. > > The heuristic approach assumes that the OOM kernel panic is caused by > a single backtrace. The assumption is not always true but it works in > many cases during our test. > > We have tested this heuristic approach since 2019/5 on android devices. > In 38 internal OOM kernel panic reports: > > 31/38: can be analyzed by using existing information > 7/38: need page owner formatino and the heuristic approach in this patch > prints the correct backtraces of abnormal memory allocations. No need to > reproduce the issues.