Am 03.04.19 um 11:34 schrieb Jan Kara: > On Tue 02-04-19 16:25:00, Andrew Morton wrote: >> >> I cc'ed a bunch of people from bugzilla. >> >> Folks, please please please remember to reply via emailed >> reply-to-all. Don't use the bugzilla interface! >> >> On Mon, 16 Jun 2014 18:29:26 +0200 "Rafael J. Wysocki" <rafael.j.wysocki@xxxxxxxxx> wrote: >> >>> On 6/13/2014 6:55 AM, Johannes Weiner wrote: >>>> On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote: >>>>> On 6/13/2014 12:02 AM, Johannes Weiner wrote: >>>>>> On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote: >>>>>>> On 5/6/2014 1:33 AM, Johannes Weiner wrote: >>>>>>>> Hi Oliver, >>>>>>>> >>>>>>>> On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote: >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> 1) Attached a full function-trace log + other SysRq outputs, see [1] >>>>>>>>> attached. >>>>>>>>> >>>>>>>>> I saw bdi_...() calls in the s2disk paths, but didn't check in detail >>>>>>>>> Probably more efficient when one of you guys looks directly. >>>>>>>> Thanks, this looks interesting. balance_dirty_pages() wakes up the >>>>>>>> bdi_wq workqueue as it should: >>>>>>>> >>>>>>>> [ 249.148009] s2disk-3327 2.... 48550413us : global_dirty_limits <-balance_dirty_pages_ratelimited >>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : global_dirtyable_memory <-global_dirty_limits >>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : writeback_in_progress <-balance_dirty_pages_ratelimited >>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : bdi_start_background_writeback <-balance_dirty_pages_ratelimited >>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : mod_delayed_work_on <-balance_dirty_pages_ratelimited >>>>>>>> but the worker wakeup doesn't actually do anything: >>>>>>>> [ 249.148009] kworker/-3466 2d... 48550431us : finish_task_switch <-__schedule >>>>>>>> [ 249.148009] kworker/-3466 2.... 48550431us : _raw_spin_lock_irq <-worker_thread >>>>>>>> [ 249.148009] kworker/-3466 2d... 48550431us : need_to_create_worker <-worker_thread >>>>>>>> [ 249.148009] kworker/-3466 2d... 48550432us : worker_enter_idle <-worker_thread >>>>>>>> [ 249.148009] kworker/-3466 2d... 48550432us : too_many_workers <-worker_enter_idle >>>>>>>> [ 249.148009] kworker/-3466 2.... 48550432us : schedule <-worker_thread >>>>>>>> [ 249.148009] kworker/-3466 2.... 48550432us : __schedule <-worker_thread >>>>>>>> >>>>>>>> My suspicion is that this fails because the bdi_wq is frozen at this >>>>>>>> point and so the flush work never runs until resume, whereas before my >>>>>>>> patch the effective dirty limit was high enough so that image could be >>>>>>>> written in one go without being throttled; followed by an fsync() that >>>>>>>> then writes the pages in the context of the unfrozen s2disk. >>>>>>>> >>>>>>>> Does this make sense? Rafael? Tejun? >>>>>>> Well, it does seem to make sense to me. >>>>>> From what I see, this is a deadlock in the userspace suspend model and >>>>>> just happened to work by chance in the past. >>>>> Well, it had been working for quite a while, so it was a rather large >>>>> opportunity >>>>> window it seems. :-) >>>> No doubt about that, and I feel bad that it broke. But it's still a >>>> deadlock that can't reasonably be accommodated from dirty throttling. >>>> >>>> It can't just put the flushers to sleep and then issue a large amount >>>> of buffered IO, hoping it doesn't hit the dirty limits. Don't shoot >>>> the messenger, this bug needs to be addressed, not get papered over. >>>> >>>>>> Can we patch suspend-utils as follows? >>>>> Perhaps we can. Let's ask the new maintainer. >>>>> >>>>> Rodolfo, do you think you can apply the patch below to suspend-utils? >>>>> >>>>>> Alternatively, suspend-utils >>>>>> could clear the dirty limits before it starts writing and restore them >>>>>> post-resume. >>>>> That (and the patch too) doesn't seem to address the problem with existing >>>>> suspend-utils >>>>> binaries, however. >>>> It's userspace that freezes the system before issuing buffered IO, so >>>> my conclusion was that the bug is in there. This is arguable. I also >>>> wouldn't be opposed to a patch that sets the dirty limits to infinity >>>> from the ioctl that freezes the system or creates the image. >>> >>> OK, that sounds like a workable plan. >>> >>> How do I set those limits to infinity? >> >> Five years have passed and people are still hitting this. >> >> Killian described the workaround in comment 14 at >> https://bugzilla.kernel.org/show_bug.cgi?id=75101. >> >> People can use this workaround manually by hand or in scripts. But we >> really should find a proper solution. Maybe special-case the freezing >> of the flusher threads until all the writeout has completed. Or >> something else. > > I've refreshed my memory wrt this bug and I believe the bug is really on > the side of suspend-utils (uswsusp or however it is called). They are low > level system tools, they ask the kernel to freeze all processes > (SNAPSHOT_FREEZE ioctl), and then they rely on buffered writeback (which is > relatively heavyweight infrastructure) to work. That is wrong in my > opinion. > > I can see Johanness was suggesting in comment 11 to use O_SYNC in > suspend-utils which worked but was too slow. Indeed O_SYNC is rather big > hammer but using O_DIRECT should be what they need and get better > performance - no additional buffering in the kernel, no dirty throttling, > etc. They only need their buffer & device offsets sector aligned - they > seem to be even page aligned in suspend-utils so they should be fine. And > if the performance still sucks (currently they appear to do mostly random > 4k writes so it probably would for rotating disks), they could use AIO DIO > to get multiple pages in flight (as many as they dare to allocate buffers) > and then the IO scheduler will reorder things as good as it can and they > should get reasonable performance. > > Is there someone who works on suspend-utils these days? Because the repo > I've found on kernel.org seems to be long dead (last commit in 2012). > > Honza > Whether it's suspend-utils (or uswsusp) or not could be answered quickly by de-installing this package and using the kernel-methods instead.
Attachment:
signature.asc
Description: OpenPGP digital signature