Dave Chinner wrote: > This exact discussion is already underway. > > My initial proposal: > > http://oss.sgi.com/archives/xfs/2015-02/msg00314.html > > Why mempools don't work but transaction based reservations will: > > http://oss.sgi.com/archives/xfs/2015-02/msg00339.html > > Reservation needs to be an accounting mechanisms, not preallocation: > > http://oss.sgi.com/archives/xfs/2015-02/msg00456.html > http://oss.sgi.com/archives/xfs/2015-02/msg00457.html > http://oss.sgi.com/archives/xfs/2015-02/msg00458.html > > And that's where the discussion currently sits. I got two problems (one is stall at io_schedule(), the other is kernel panic due to xfs's assertion failure) using Linux 3.19. I guess those problems are caused by not retrying !GFP_FS allocations under OOM. Will those problems go away by using transaction based reservations? And if yes, are they simple enough to backport to vendor's kernels? (From http://I-love.SAKURA.ne.jp/tmp/serial-20150225-1.txt.xz ) ---------- [ 1225.773411] kworker/3:0H D ffff88007cadb4f8 11632 27 2 0x00000000 [ 1225.776911] ffff88007cadb4f8 ffff88007cadb508 ffff88007cac6740 0000000000014080 [ 1225.780670] ffffffff8101cd19 ffff88007cadbfd8 0000000000014080 ffff88007c28b740 [ 1225.784431] ffff88007cac6740 ffff88007cadb540 ffff88007f8d4998 ffff88007cadb540 [ 1225.788766] Call Trace: [ 1225.789988] [<ffffffff8101cd19>] ? read_tsc+0x9/0x10 [ 1225.792444] [<ffffffff812acbd9>] ? xfs_iunpin_wait+0x19/0x20 [ 1225.795228] [<ffffffff816b2590>] io_schedule+0xa0/0x130 [ 1225.797802] [<ffffffff812a9569>] __xfs_iunpin_wait+0xe9/0x140 [ 1225.800621] [<ffffffff810af3b0>] ? autoremove_wake_function+0x40/0x40 [ 1225.803770] [<ffffffff812acbd9>] xfs_iunpin_wait+0x19/0x20 [ 1225.806471] [<ffffffff812a209c>] xfs_reclaim_inode+0x7c/0x360 [ 1225.809283] [<ffffffff812a25d7>] xfs_reclaim_inodes_ag+0x257/0x370 [ 1225.812308] [<ffffffff81340839>] ? radix_tree_gang_lookup_tag+0x89/0xd0 [ 1225.815532] [<ffffffff8116fe58>] ? list_lru_walk_node+0x148/0x190 [ 1225.817951] [<ffffffff812a2783>] xfs_reclaim_inodes_nr+0x33/0x40 [ 1225.819373] [<ffffffff812b3545>] xfs_fs_free_cached_objects+0x15/0x20 [ 1225.820898] [<ffffffff811c29e9>] super_cache_scan+0x169/0x170 [ 1225.822245] [<ffffffff8115aed6>] shrink_node_slabs+0x1d6/0x370 [ 1225.823588] [<ffffffff8115dd2a>] shrink_zone+0x20a/0x240 [ 1225.824830] [<ffffffff8115e0dc>] do_try_to_free_pages+0x16c/0x460 [ 1225.826230] [<ffffffff8115e48a>] try_to_free_pages+0xba/0x150 [ 1225.827570] [<ffffffff81151542>] __alloc_pages_nodemask+0x5b2/0x9d0 [ 1225.829030] [<ffffffff8119ecbc>] kmem_getpages+0x8c/0x200 [ 1225.830277] [<ffffffff811a122b>] fallback_alloc+0x17b/0x230 [ 1225.831561] [<ffffffff811a107b>] ____cache_alloc_node+0x18b/0x1c0 [ 1225.833061] [<ffffffff811a3b00>] kmem_cache_alloc+0x330/0x5c0 [ 1225.834435] [<ffffffff8133c9d9>] ? ida_pre_get+0x69/0x100 [ 1225.835719] [<ffffffff8133c9d9>] ida_pre_get+0x69/0x100 [ 1225.836963] [<ffffffff8133d312>] ida_simple_get+0x42/0xf0 [ 1225.838248] [<ffffffff81086211>] create_worker+0x31/0x1c0 [ 1225.839519] [<ffffffff81087831>] worker_thread+0x3d1/0x4d0 [ 1225.840800] [<ffffffff81087460>] ? rescuer_thread+0x3a0/0x3a0 [ 1225.842123] [<ffffffff8108c5e2>] kthread+0xd2/0xf0 [ 1225.843234] [<ffffffff81010000>] ? perf_trace_xen_mmu_ptep_modify_prot+0x90/0xf0 [ 1225.844978] [<ffffffff8108c510>] ? kthread_create_on_node+0x180/0x180 [ 1225.846481] [<ffffffff816b63fc>] ret_from_fork+0x7c/0xb0 [ 1225.847718] [<ffffffff8108c510>] ? kthread_create_on_node+0x180/0x180 [ 1225.849279] kswapd0 D ffff88007708f998 11552 45 2 0x00000000 [ 1225.850977] ffff88007708f998 0000000000000000 ffff88007c28b740 0000000000014080 [ 1225.852798] 0000000000000003 ffff88007708ffd8 0000000000014080 ffff880077ff2740 [ 1225.854575] ffff88007c28b740 0000000000000000 ffff88007948e3a8 ffff88007948e3ac [ 1225.856358] Call Trace: [ 1225.856928] [<ffffffff816b2799>] schedule_preempt_disabled+0x29/0x70 [ 1225.858384] [<ffffffff816b43d5>] __mutex_lock_slowpath+0x95/0x100 [ 1225.859799] [<ffffffff816b4463>] mutex_lock+0x23/0x37 [ 1225.860983] [<ffffffff812a264c>] xfs_reclaim_inodes_ag+0x2cc/0x370 [ 1225.862403] [<ffffffff8109eb48>] ? __enqueue_entity+0x78/0x80 [ 1225.863742] [<ffffffff810a5f37>] ? enqueue_entity+0x237/0x8f0 [ 1225.865100] [<ffffffff81340839>] ? radix_tree_gang_lookup_tag+0x89/0xd0 [ 1225.866659] [<ffffffff8116fe58>] ? list_lru_walk_node+0x148/0x190 [ 1225.868106] [<ffffffff812a2783>] xfs_reclaim_inodes_nr+0x33/0x40 [ 1225.869522] [<ffffffff812b3545>] xfs_fs_free_cached_objects+0x15/0x20 [ 1225.871015] [<ffffffff811c29e9>] super_cache_scan+0x169/0x170 [ 1225.872338] [<ffffffff8115aed6>] shrink_node_slabs+0x1d6/0x370 [ 1225.873679] [<ffffffff8115dd2a>] shrink_zone+0x20a/0x240 [ 1225.874920] [<ffffffff8115ed2d>] kswapd+0x4fd/0x9c0 [ 1225.876049] [<ffffffff8115e830>] ? mem_cgroup_shrink_node_zone+0x140/0x140 [ 1225.877654] [<ffffffff8108c5e2>] kthread+0xd2/0xf0 [ 1225.878762] [<ffffffff81010000>] ? perf_trace_xen_mmu_ptep_modify_prot+0x90/0xf0 [ 1225.880495] [<ffffffff8108c510>] ? kthread_create_on_node+0x180/0x180 [ 1225.881996] [<ffffffff816b63fc>] ret_from_fork+0x7c/0xb0 [ 1225.883336] [<ffffffff8108c510>] ? kthread_create_on_node+0x180/0x180 ---------- (From http://I-love.SAKURA.ne.jp/tmp/serial-20150225-2.txt.xz + http://I-love.SAKURA.ne.jp/tmp/crash-20150225-2.log.xz ) ---------- [ 189.586204] Out of memory: Kill process 3701 (a.out) score 834 or sacrifice child [ 189.586205] Killed process 3701 (a.out) total-vm:2167392kB, anon-rss:1465820kB, file-rss:4kB [ 189.586210] Kill process 3702 (a.out) sharing same memory [ 189.586211] Kill process 3714 (a.out) sharing same memory [ 189.586212] Kill process 3748 (a.out) sharing same memory [ 189.586213] Kill process 3755 (a.out) sharing same memory [ 189.593470] XFS: Assertion failed: XFS_FORCED_SHUTDOWN(mp), file: fs/xfs/xfs_inode.c, line: 1701 [ 189.593491] ------------[ cut here ]------------ [ 189.593492] kernel BUG at fs/xfs/xfs_message.c:106! [ 189.593493] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC [ 189.593511] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_raw iptable_filter ip_tables coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel glue_helper lrw gf128mul ablk_helper cryptd dm_mirror dm_region_hash dm_log microcode dm_mod ppdev parport_pc pcspkr vmw_balloon serio_raw vmw_vmci parport shpchp i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc uinput ata_generic pata_acpi sd_mod ata_piix mptspi libata scsi_transport_spi e1000 mptscsih mptbase floppy [ 189.593512] CPU: 1 PID: 3755 Comm: a.out Not tainted 3.19.0 #42 [ 189.593512] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 189.593513] task: ffff88007a848740 ti: ffff88005c064000 task.ti: ffff88005c064000 [ 189.593517] RIP: 0010:[<ffffffff812af992>] [<ffffffff812af992>] assfail+0x22/0x30 [ 189.593517] RSP: 0000:ffff88005c067af8 EFLAGS: 00010292 [ 189.593518] RAX: 0000000000000054 RBX: ffff880079349c00 RCX: 0000000000000050 [ 189.593518] RDX: 0000000000005050 RSI: 0000000000000282 RDI: 0000000000000282 [ 189.593519] RBP: ffff88005c067af8 R08: 0000000000000282 R09: 0000000000000000 [ 189.593519] R10: ffffffff81ec95c8 R11: 656c696166206e6f R12: ffff88005ee92800 [ 189.593519] R13: 00000000fffffff4 R14: ffffffff81838140 R15: ffff880064505390 [ 189.593520] FS: 00007f62d93e0740(0000) GS:ffff88007f840000(0000) knlGS:0000000000000000 [ 189.593521] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 189.593521] CR2: 00007fb901282763 CR3: 0000000077b00000 CR4: 00000000000407e0 [ 189.593562] Stack: [ 189.593564] ffff88005c067b38 ffffffff812ab2d7 ffff880079349e48 ffff88007a6feef0 [ 189.593564] ffff88005c067b38 ffff880079349c00 0000000000000001 ffff880079349db8 [ 189.593565] ffff88005c067b58 ffffffff812acb98 ffff880079349db8 ffff880079349c00 [ 189.593565] Call Trace: [ 189.593568] [<ffffffff812ab2d7>] xfs_inactive_truncate+0x67/0x150 [ 189.593569] [<ffffffff812acb98>] xfs_inactive+0x1c8/0x1f0 [ 189.593570] [<ffffffff812b3216>] xfs_fs_evict_inode+0x86/0xd0 [ 189.593572] [<ffffffff811da0f8>] evict+0xb8/0x190 [ 189.593574] [<ffffffff811daa15>] iput+0xf5/0x180 [ 189.593575] [<ffffffff811d5b58>] __dentry_kill+0x188/0x1f0 [ 189.593576] [<ffffffff811d5c65>] dput+0xa5/0x170 [ 189.593577] [<ffffffff811c0dbd>] __fput+0x16d/0x1e0 [ 189.593578] [<ffffffff811c0e7e>] ____fput+0xe/0x10 [ 189.593580] [<ffffffff8108ac9f>] task_work_run+0xaf/0xf0 [ 189.593582] [<ffffffff81071638>] do_exit+0x2d8/0xbe0 [ 189.593583] [<ffffffff8107a5df>] ? recalc_sigpending+0x1f/0x60 [ 189.593584] [<ffffffff81071fcf>] do_group_exit+0x3f/0xa0 [ 189.593585] [<ffffffff8107d322>] get_signal+0x1d2/0x6f0 [ 189.593588] [<ffffffff810134e8>] do_signal+0x28/0x720 [ 189.593589] [<ffffffff811c1825>] ? __sb_end_write+0x35/0x70 [ 189.593591] [<ffffffff811bf362>] ? vfs_write+0x172/0x1f0 [ 189.593592] [<ffffffff81013c2c>] do_notify_resume+0x4c/0x90 [ 189.593594] [<ffffffff816b6747>] int_signal+0x12/0x17 [ 189.593602] Code: 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 f1 41 89 d0 48 c7 c6 48 8b 97 81 48 89 fa 31 c0 48 89 e5 31 ff e8 de fb ff ff <0f> 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 [ 189.593603] RIP [<ffffffff812af992>] assfail+0x22/0x30 [ 189.593604] RSP <ffff88005c067af8> ---------- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>