Re: [Bug 75101] New: [bisected] s2disk / hibernate blocks on "Saving 506031 image data pages () ..."

Matheus Fillipe <matheusfillipeag@xxxxxxxxx> · Wed, 3 Apr 2019 13:59:45 -0300

Yes I can sorta confirm the bug is in uswsusp. I removed the package
and pm-utils and used both "systemctl hibernate"  and "echo disk >>
/sys/power/state" to hibernate. It seems to succeed and shuts down, I
am just not able to resume from it, which seems to be a classical
problem solved just by setting the resume swap file/partition on grub.
(which i tried and didn't work even with nvidia disabled)

Anyway uswsusp is still necessary because the default kernel
hibernation doesn't work with the proprietary nvidia drivers as long
as I know  and tested.

Is there anyway I could get any workaround to this bug on my current
OS by the way?

On Wed, Apr 3, 2019 at 7:04 AM Rainer Fiebig <jrf@xxxxxxxxxxx> wrote:
>
> Am 03.04.19 um 11:34 schrieb Jan Kara:
> > On Tue 02-04-19 16:25:00, Andrew Morton wrote:
> >>
> >> I cc'ed a bunch of people from bugzilla.
> >>
> >> Folks, please please please remember to reply via emailed
> >> reply-to-all.  Don't use the bugzilla interface!
> >>
> >> On Mon, 16 Jun 2014 18:29:26 +0200 "Rafael J. Wysocki" <rafael.j.wysocki@xxxxxxxxx> wrote:
> >>
> >>> On 6/13/2014 6:55 AM, Johannes Weiner wrote:
> >>>> On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote:
> >>>>> On 6/13/2014 12:02 AM, Johannes Weiner wrote:
> >>>>>> On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote:
> >>>>>>> On 5/6/2014 1:33 AM, Johannes Weiner wrote:
> >>>>>>>> Hi Oliver,
> >>>>>>>>
> >>>>>>>> On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote:
> >>>>>>>>> Hello,
> >>>>>>>>>
> >>>>>>>>> 1) Attached a full function-trace log + other SysRq outputs, see [1]
> >>>>>>>>> attached.
> >>>>>>>>>
> >>>>>>>>> I saw bdi_...() calls in the s2disk paths, but didn't check in detail
> >>>>>>>>> Probably more efficient when one of you guys looks directly.
> >>>>>>>> Thanks, this looks interesting.  balance_dirty_pages() wakes up the
> >>>>>>>> bdi_wq workqueue as it should:
> >>>>>>>>
> >>>>>>>> [  249.148009]   s2disk-3327    2.... 48550413us : global_dirty_limits <-balance_dirty_pages_ratelimited
> >>>>>>>> [  249.148009]   s2disk-3327    2.... 48550414us : global_dirtyable_memory <-global_dirty_limits
> >>>>>>>> [  249.148009]   s2disk-3327    2.... 48550414us : writeback_in_progress <-balance_dirty_pages_ratelimited
> >>>>>>>> [  249.148009]   s2disk-3327    2.... 48550414us : bdi_start_background_writeback <-balance_dirty_pages_ratelimited
> >>>>>>>> [  249.148009]   s2disk-3327    2.... 48550414us : mod_delayed_work_on <-balance_dirty_pages_ratelimited
> >>>>>>>> but the worker wakeup doesn't actually do anything:
> >>>>>>>> [  249.148009] kworker/-3466    2d... 48550431us : finish_task_switch <-__schedule
> >>>>>>>> [  249.148009] kworker/-3466    2.... 48550431us : _raw_spin_lock_irq <-worker_thread
> >>>>>>>> [  249.148009] kworker/-3466    2d... 48550431us : need_to_create_worker <-worker_thread
> >>>>>>>> [  249.148009] kworker/-3466    2d... 48550432us : worker_enter_idle <-worker_thread
> >>>>>>>> [  249.148009] kworker/-3466    2d... 48550432us : too_many_workers <-worker_enter_idle
> >>>>>>>> [  249.148009] kworker/-3466    2.... 48550432us : schedule <-worker_thread
> >>>>>>>> [  249.148009] kworker/-3466    2.... 48550432us : __schedule <-worker_thread
> >>>>>>>>
> >>>>>>>> My suspicion is that this fails because the bdi_wq is frozen at this
> >>>>>>>> point and so the flush work never runs until resume, whereas before my
> >>>>>>>> patch the effective dirty limit was high enough so that image could be
> >>>>>>>> written in one go without being throttled; followed by an fsync() that
> >>>>>>>> then writes the pages in the context of the unfrozen s2disk.
> >>>>>>>>
> >>>>>>>> Does this make sense?  Rafael?  Tejun?
> >>>>>>> Well, it does seem to make sense to me.
> >>>>>>  From what I see, this is a deadlock in the userspace suspend model and
> >>>>>> just happened to work by chance in the past.
> >>>>> Well, it had been working for quite a while, so it was a rather large
> >>>>> opportunity
> >>>>> window it seems. :-)
> >>>> No doubt about that, and I feel bad that it broke.  But it's still a
> >>>> deadlock that can't reasonably be accommodated from dirty throttling.
> >>>>
> >>>> It can't just put the flushers to sleep and then issue a large amount
> >>>> of buffered IO, hoping it doesn't hit the dirty limits.  Don't shoot
> >>>> the messenger, this bug needs to be addressed, not get papered over.
> >>>>
> >>>>>> Can we patch suspend-utils as follows?
> >>>>> Perhaps we can.  Let's ask the new maintainer.
> >>>>>
> >>>>> Rodolfo, do you think you can apply the patch below to suspend-utils?
> >>>>>
> >>>>>> Alternatively, suspend-utils
> >>>>>> could clear the dirty limits before it starts writing and restore them
> >>>>>> post-resume.
> >>>>> That (and the patch too) doesn't seem to address the problem with existing
> >>>>> suspend-utils
> >>>>> binaries, however.
> >>>> It's userspace that freezes the system before issuing buffered IO, so
> >>>> my conclusion was that the bug is in there.  This is arguable.  I also
> >>>> wouldn't be opposed to a patch that sets the dirty limits to infinity
> >>>> from the ioctl that freezes the system or creates the image.
> >>>
> >>> OK, that sounds like a workable plan.
> >>>
> >>> How do I set those limits to infinity?
> >>
> >> Five years have passed and people are still hitting this.
> >>
> >> Killian described the workaround in comment 14 at
> >> https://bugzilla.kernel.org/show_bug.cgi?id=75101.
> >>
> >> People can use this workaround manually by hand or in scripts.  But we
> >> really should find a proper solution.  Maybe special-case the freezing
> >> of the flusher threads until all the writeout has completed.  Or
> >> something else.
> >
> > I've refreshed my memory wrt this bug and I believe the bug is really on
> > the side of suspend-utils (uswsusp or however it is called). They are low
> > level system tools, they ask the kernel to freeze all processes
> > (SNAPSHOT_FREEZE ioctl), and then they rely on buffered writeback (which is
> > relatively heavyweight infrastructure) to work. That is wrong in my
> > opinion.
> >
> > I can see Johanness was suggesting in comment 11 to use O_SYNC in
> > suspend-utils which worked but was too slow. Indeed O_SYNC is rather big
> > hammer but using O_DIRECT should be what they need and get better
> > performance - no additional buffering in the kernel, no dirty throttling,
> > etc. They only need their buffer & device offsets sector aligned - they
> > seem to be even page aligned in suspend-utils so they should be fine. And
> > if the performance still sucks (currently they appear to do mostly random
> > 4k writes so it probably would for rotating disks), they could use AIO DIO
> > to get multiple pages in flight (as many as they dare to allocate buffers)
> > and then the IO scheduler will reorder things as good as it can and they
> > should get reasonable performance.
> >
> > Is there someone who works on suspend-utils these days? Because the repo
> > I've found on kernel.org seems to be long dead (last commit in 2012).
> >
> >                                                               Honza
> >
>
> Whether it's suspend-utils (or uswsusp) or not could be answered quickly
> by de-installing this package and using the kernel-methods instead.
>
>