Hello, On Mon 05-05-14 11:35:41, Johannes Weiner wrote: > On Tue, Apr 29, 2014 at 03:24:37PM -0700, Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > > bugzilla web interface). > > > > On Tue, 29 Apr 2014 20:13:44 +0000 bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote: > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=75101 > > > > > > Bug ID: 75101 > > > Summary: [bisected] s2disk / hibernate blocks on "Saving 506031 > > > image data pages () ..." > > > Product: Memory Management > > > Version: 2.5 > > > Kernel Version: v3.14 > > > Hardware: All > > > OS: Linux > > > Tree: Mainline > > > Status: NEW > > > Severity: normal > > > Priority: P1 > > > Component: Other > > > Assignee: akpm@xxxxxxxxxxxxxxxxxxxx > > > Reporter: oliverml1@xxxxxxxxxxx > > > Regression: No > > > > > > Created attachment 134271 > > > --> https://bugzilla.kernel.org/attachment.cgi?id=134271&action=edit > > > Full console trace with various SysRq outputs > > > > > > Since v3.14 under normal desktop usage my s2disk/hibernate often blocks on the > > > saving of the image data ("Saving 506031 image data pages () ..."). > > > > A means to reproduce as well as a bisection result. Nice! Thanks. > > > > Johannes, could you please take a look? > > > > > With following test I can reproduce the problem reliably: > > > --- > > > 0) Boot > > > > > > 1) Fill ram with 2GiB (+50% in my case) > > > > > > mount -t tmpfs tmpfs /media/test/ > > > dd if=/dev/zero of=/media/test/test0.bin bs=1k count=$[1024*1024] > > > dd if=/dev/zero of=/media/test/test1.bin bs=1k count=$[1024*1024] > > > > > > 2) Do s2disk > > > > > > s2disk > > > > > > --- > > > s2disk: Unable to switch virtual terminals, using the current console. > > > s2disk: Snapshotting system > > > s2disk: System snapshot ready. Preparing to write > > > s2disk: Image size: 2024124 kilobytes > > > s2disk: Free swap: 3791208 kilobytes > > > s2disk: Saving 506031 image data pages (press backspace to abort) ... 0% > > > > > > #Problem>: ... there is stays and blocks. SysRq still responds, so that I could > > > trigger various debug outputs. > > According to your dmesg s2disk is stuck in balance_dirty_pages(): > > [ 215.645240] s2disk D ffff88011fd93100 0 3323 3261 0x00000000 > [ 215.645240] ffff8801196d4110 0000000000000082 0000000000013100 ffff8801196d4110 > [ 215.645240] ffff8800365cdfd8 ffff880119ed9190 00000000ffffc16c ffff8800365cdbe8 > [ 215.645240] 0000000000000032 0000000000000032 ffff8801196d4110 0000000000000000 > [ 215.645240] Call Trace: > [ 215.645240] [<ffffffff8162fdce>] ? schedule_timeout+0xde/0xff > [ 215.645240] [<ffffffff81041be1>] ? ftrace_raw_output_tick_stop+0x55/0x55 > [ 215.645240] [<ffffffff81630987>] ? io_schedule_timeout+0x5d/0x7e > [ 215.645240] [<ffffffff810cb035>] ? balance_dirty_pages_ratelimited+0x588/0x747 > [ 215.645240] [<ffffffff812d0795>] ? radix_tree_tag_set+0x69/0xc4 > [ 215.645240] [<ffffffff810c244e>] ? generic_file_buffered_write+0x1a8/0x21c > [ 215.645240] [<ffffffff810c351e>] ? __generic_file_aio_write+0x1c7/0x1fe > [ 215.645240] [<ffffffff81134ab5>] ? blkdev_aio_write+0x44/0x79 > [ 215.645240] [<ffffffff8110c02a>] ? do_sync_write+0x56/0x76 > [ 215.645240] [<ffffffff8110c33c>] ? vfs_write+0xa1/0xfb > [ 215.645240] [<ffffffff8110ca08>] ? SyS_write+0x41/0x74 > [ 215.645240] [<ffffffff81637622>] ? system_call_fastpath+0x16/0x1b > > but I don't see a flusher thread anywhere. > > What the bisected change does is allow the effective dirty threshold > to drop fairly low, because anonymous pages are no longer considered > dirtyable, and your usecase has particularly low free + cache pages: > > [ 196.375988] active_anon:328150 inactive_anon:118571 isolated_anon:0 > [ 196.375988] active_file:1658 inactive_file:1823 isolated_file:0 > [ 196.375988] unevictable:867 dirty:616 writeback:0 unstable:0 > [ 196.375988] free:32320 slab_reclaimable:5129 slab_unreclaimable:5080 > [ 196.375988] mapped:2684 shmem:424844 pagetables:1528 bounce:0 > [ 196.375988] free_cma:0 > > Ignoring free pages due to dirty_balance_reserve, inactive+active file > yields 3481 dirtyable pages, which sets the global limits to 174 pages > background and 348 pages foreground with the default configuration. > It's low, but not 0. OK, so we are over the dirty_limit. > So why is the dirtier throttled to starvation when the background flusher > is not even running? Shouldn't they be looking at the same numbers and > behave inversely? These days there isn't a background flusher thread but a workqueue which handles the flushing work. But still you should see that in a process list like "flush-$dev". Can you check whether balance_dirty_pages() properly calls bdi_start_background_writeback() and whether wb_do_writeback() gets to run (there are tracepoints in there)? Also can you have a look in /sys/kernel/debug/bdi/<dev>/stats? What is the estimated bandwith? Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>