On 02/09/2010 06:32 PM, KOSAKI Motohiro wrote: > can you please post your /proc/meminfo? On 02/09/2010 09:50 PM, Balbir Singh wrote: > Do you have swap enabled? Can you help with the OOM killed dmesg log? > Does the situation get better after OOM killing. On 02/09/2010 10:09 PM, KOSAKI Motohiro wrote: > Chris, 2.6.27 is a bit old. plese test it on latest kernel. and please don't use > any proprietary drivers. Thanks for the replies. Swap is enabled in the kernel, but there is no swap configured. ipcs shows little consumption there. The test load relies on a number of kernel modifications, making it difficult to use newer kernels. (This is an embedded system.) There are no closed-source drivers loaded, though there are some that are not in vanilla kernels. I haven't yet tried to reproduce the problem with a minimal load--I've been more focused on trying to understand what's going on in the code first. It's on my list to try though. Here are some /proc/meminfo outputs from a test run where we artificially chewed most of the free memory to try and force the oom killer to fire sooner (otherwise it takes days for the problem to trigger). It's spaced with tabs so I'm not sure if it'll stay aligned. The first row is the sample number. All the HugePages entries were 0. The DirectMap entries were constant. SwapTotal/SwapFree/SwapCached were 0, as were Writeback/NFS_Unstable/Bounce/WritebackTmp. Samples were taken 10 minutes apart. Between samples 49 and 50 the oom-killer fired. 13 49 50 MemTotal 4042848 4042848 4042848 MemFree 113512 52668 69536 Buffers 20 24 76 Cached 1285588 1287456 1295128 Active 2883224 3369440 2850172 Inactive 913756 487944 990152 Dirty 36 216 252 AnonPages 2274756 2305448 2279216 Mapped 10804 12772 15760 Slab 62324 62568 63608 SReclaimable 24092 23912 24848 SUnreclaim 38232 38656 38760 PageTables 11960 12144 11848 CommitLimit 2021424 2021424 2021424 Committed_AS 12666508 12745200 7700484 VmallocUsed 23256 23256 23256 It's hard to get a good picture from just a few samples, so I've attached an ooffice spreadsheet showing three separate runs. The samples above are from sheet 3 in the document. In those spreadsheets I notice that memfree+active+inactive+slab+pagetables is basically a constant. However, if I don't use active+inactive then I can't make the numbers add up. And the difference between active+inactive and buffers+cached+anonpages+dirty+mapped+pagetables+vmallocused grows almost monotonically. Thanks, Chris
Attachment:
meminfo.ods
Description: application/vnd.oasis.opendocument.spreadsheet