Michal Hocko wrote: > The OOM report is really interesting: > > > [ 69.039152] Node 0 DMA32 free:74224kB min:44652kB low:55812kB high:66976kB active_anon:1334792kB inactive_anon:8240kB active_file:48364kB inactive_file:230752kB unevictable:0kB isolated(anon):92kB isolated(file):0kB present:2080640kB managed:1774264kB mlocked:0kB dirty:9328kB writeback:199060kB mapped:38140kB shmem:8472kB slab_reclaimable:17840kB slab_unreclaimable:16292kB kernel_stack:3840kB pagetables:7864kB unstable:0kB bounce:0kB free_pcp:784kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > > so your whole file LRUs are either dirty or under writeback and > reclaimable pages are below min wmark. This alone is quite suspicious. I did $ cat < /dev/zero > /tmp/log for 10 seconds before starting $ ./a.out Thus, so much memory was waiting for writeback on XFS filesystem. > Why hasn't balance_dirty_pages throttled writers and allowed them to > make the whole LRU dirty? What is your dirty{_background}_{ratio,bytes} > configuration on that system. All values are defaults of plain CentOS 7 installation. # sysctl -a | grep ^vm. vm.admin_reserve_kbytes = 8192 vm.block_dump = 0 vm.compact_unevictable_allowed = 1 vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 30 vm.dirty_writeback_centisecs = 500 vm.dirtytime_expire_seconds = 43200 vm.drop_caches = 0 vm.extfrag_threshold = 500 vm.hugepages_treat_as_movable = 0 vm.hugetlb_shm_group = 0 vm.laptop_mode = 0 vm.legacy_va_layout = 0 vm.lowmem_reserve_ratio = 256 256 32 vm.max_map_count = 65530 vm.memory_failure_early_kill = 0 vm.memory_failure_recovery = 1 vm.min_free_kbytes = 45056 vm.min_slab_ratio = 5 vm.min_unmapped_ratio = 1 vm.mmap_min_addr = 4096 vm.nr_hugepages = 0 vm.nr_hugepages_mempolicy = 0 vm.nr_overcommit_hugepages = 0 vm.nr_pdflush_threads = 0 vm.numa_zonelist_order = default vm.oom_dump_tasks = 1 vm.oom_kill_allocating_task = 0 vm.overcommit_kbytes = 0 vm.overcommit_memory = 0 vm.overcommit_ratio = 50 vm.page-cluster = 3 vm.panic_on_oom = 0 vm.percpu_pagelist_fraction = 0 vm.stat_interval = 1 vm.swappiness = 30 vm.user_reserve_kbytes = 54808 vm.vfs_cache_pressure = 100 vm.zone_reclaim_mode = 0 > > Also why throttle_vm_writeout haven't slown the reclaim down? Too difficult question for me. > > Anyway this is exactly the case where zone_reclaimable helps us to > prevent OOM because we are looping over the remaining LRU pages without > making progress... This just shows how subtle all this is :/ > > I have to think about this much more.. I'm suspicious about tweaking current reclaim logic. Could you please respond to Linus's comments? There are more moles than kernel developers can find. I think that what we can do for short term is to prepare for moles that kernel developers could not find, and for long term is to reform page allocator for preventing moles from living. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>