Re: [Bug 75101] New: [bisected] s2disk / hibernate blocks on "Saving 506031 image data pages () ..."

Johannes Weiner <hannes@xxxxxxxxxxx> · Fri, 13 Jun 2014 00:55:57 -0400

On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote:
> On 6/13/2014 12:02 AM, Johannes Weiner wrote:
> >On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote:
> >>On 5/6/2014 1:33 AM, Johannes Weiner wrote:
> >>>Hi Oliver,
> >>>
> >>>On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote:
> >>>>Hello,
> >>>>
> >>>>1) Attached a full function-trace log + other SysRq outputs, see [1]
> >>>>attached.
> >>>>
> >>>>I saw bdi_...() calls in the s2disk paths, but didn't check in detail
> >>>>Probably more efficient when one of you guys looks directly.
> >>>Thanks, this looks interesting.  balance_dirty_pages() wakes up the
> >>>bdi_wq workqueue as it should:
> >>>
> >>>[  249.148009]   s2disk-3327    2.... 48550413us : global_dirty_limits <-balance_dirty_pages_ratelimited
> >>>[  249.148009]   s2disk-3327    2.... 48550414us : global_dirtyable_memory <-global_dirty_limits
> >>>[  249.148009]   s2disk-3327    2.... 48550414us : writeback_in_progress <-balance_dirty_pages_ratelimited
> >>>[  249.148009]   s2disk-3327    2.... 48550414us : bdi_start_background_writeback <-balance_dirty_pages_ratelimited
> >>>[  249.148009]   s2disk-3327    2.... 48550414us : mod_delayed_work_on <-balance_dirty_pages_ratelimited
> >>>but the worker wakeup doesn't actually do anything:
> >>>[  249.148009] kworker/-3466    2d... 48550431us : finish_task_switch <-__schedule
> >>>[  249.148009] kworker/-3466    2.... 48550431us : _raw_spin_lock_irq <-worker_thread
> >>>[  249.148009] kworker/-3466    2d... 48550431us : need_to_create_worker <-worker_thread
> >>>[  249.148009] kworker/-3466    2d... 48550432us : worker_enter_idle <-worker_thread
> >>>[  249.148009] kworker/-3466    2d... 48550432us : too_many_workers <-worker_enter_idle
> >>>[  249.148009] kworker/-3466    2.... 48550432us : schedule <-worker_thread
> >>>[  249.148009] kworker/-3466    2.... 48550432us : __schedule <-worker_thread
> >>>
> >>>My suspicion is that this fails because the bdi_wq is frozen at this
> >>>point and so the flush work never runs until resume, whereas before my
> >>>patch the effective dirty limit was high enough so that image could be
> >>>written in one go without being throttled; followed by an fsync() that
> >>>then writes the pages in the context of the unfrozen s2disk.
> >>>
> >>>Does this make sense?  Rafael?  Tejun?
> >>Well, it does seem to make sense to me.
> > From what I see, this is a deadlock in the userspace suspend model and
> >just happened to work by chance in the past.
> 
> Well, it had been working for quite a while, so it was a rather large
> opportunity
> window it seems. :-)

No doubt about that, and I feel bad that it broke.  But it's still a
deadlock that can't reasonably be accommodated from dirty throttling.

It can't just put the flushers to sleep and then issue a large amount
of buffered IO, hoping it doesn't hit the dirty limits.  Don't shoot
the messenger, this bug needs to be addressed, not get papered over.

> >Can we patch suspend-utils as follows?
> 
> Perhaps we can.  Let's ask the new maintainer.
> 
> Rodolfo, do you think you can apply the patch below to suspend-utils?
> 
> >Alternatively, suspend-utils
> >could clear the dirty limits before it starts writing and restore them
> >post-resume.
> 
> That (and the patch too) doesn't seem to address the problem with existing
> suspend-utils
> binaries, however.

It's userspace that freezes the system before issuing buffered IO, so
my conclusion was that the bug is in there.  This is arguable.  I also
wouldn't be opposed to a patch that sets the dirty limits to infinity
from the ioctl that freezes the system or creates the image.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>