On Wed 04-07-18 18:45:48, Petros Angelatos wrote: > > I assume dd just tried to fault a code page in and that failed due to > > the hard limit and unreclaimable memory. The reason why the memcg v1 > > oom throttling heuristic hasn't kicked in is that there are no pages > > under writeback. This would match symptoms of the bug fixed by > > 1c610d5f93c7 ("mm/vmscan: wake up flushers for legacy cgroups too") in > > 4.16 but there might be more. You should have that fix already so there > > must be something more in the game. You've said that you are using blkio > > cgroup, right? What is the configuration? I strongly suspect that none > > of the writeback has started because of the throttling. > > I'm only using a memory cgroup with no blkio restrictions so I'm not > sure why writeback hasn't started. Another thing I noticed is that > it's a lot harder to reproduce when the same amount of data is written > in a single file versus many smaller files. That's why my original > example code writes 500 files with 1MB of data. > > Your mention of writeback gave me the idea to try and do a > sync_file_range() with SYNC_FILE_RANGE_WRITE after writing each file > to manually schedule writeback and surprisingly it fixed the problem. > Is that an indication of a bug in the kernel that doesn't trigger > writeback in time? Yeah, it smells so. If you look at 1c610d5f93c7, we've had bug where we even didn't kick flushers. So it seems they do not start to do a useful work in time. I would start digging that direction. > Also, you mentioned that the pagefault is probably due to a code page. > Would another remedy be to lock the whole executable and dynamic > libraries in memory with mlock() before starting the IO operations? That looks like a big hammer to me. -- Michal Hocko SUSE Labs