Re: is hibernation usable?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 20, 2020 at 12:45 PM Luigi Semenzato <semenzato@xxxxxxxxxx> wrote:
>
> On Thu, Feb 20, 2020 at 11:09 AM Chris Murphy <lists@xxxxxxxxxxxxxxxxx> wrote:
> >
> > On Thu, Feb 20, 2020 at 10:16 AM Luigi Semenzato <semenzato@xxxxxxxxxx> wrote:
> > >
> > > I think this is the right group for the memory issues.
> > >
> > > I suspect that the problem with failed allocations (ENOMEM) boils down
> > > to the unreliability of the page allocator.  In my experience, under
> > > pressure (i.e. pages must be swapped out to be reclaimed) allocations
> > > can fail even when in theory they should succeed.  (I wish I were
> > > wrong and that someone would convincingly correct me.)
> >
> > What is vm.swappiness set to on your system? A fellow Fedora
> > contributor who has consistently reproduced what you describe, has
> > discovered he has vm.swappiness=0, and even if it's set to 1, the
> > problem no longer happens. And this is not a documented consequence of
> > using a value of 0.
>
> I am using the default value of 60.
>
> A zero value should cause all file pages to be discarded before any
> anonymous pages are swapped.  I wonder if the fellow Fedora
> contributor's workload has a lot of file pages, so that discarding
> them is enough for the image allocator to succeed. In that case "sync;
> echo 1 > /proc/sys/vm/drop_caches" would be a better way of achieving
> the same result.  (By the way, in my experiments I do that just before
> hibernating.)

Unfortunately I can't reproduce graceful failure you describe, myself.
I either get successful hibernation/resume or some kind of
non-deterministic and fatal failure to enter hibernation - and any
dmesg/journal that might contain evidence of the failure is lost. I've
had better success with qemu-kvm testing, but even in that case I see
about 1/4 of the time (with a ridiculously small sample size) failure
to complete hibernation entry. I can't tell if the failure happens
during page out, hibernation image creation, or hibernation image
write out - but the result is a black screen (virt-manager console)
and the VM never shutsdown or reboots, it just hangs and spins ~400%
CPU (even though it's only assigned 3 CPUs).

It's sufficiently unreliable that I can't really consider it supported
or supportable.

Microsoft and Apple have put more emphasis lately on S0 low power
idle, faster booting, and application state saving. The behavior in
Windows 10 with hiberfil.sys is a limited environment, essentially
that of the login window (no user environment state is saved in it),
and is used both for resuming from S4, as well as fast boot. A
separate file pagefile.sys is used for paging, so there's never a
conflict where a use case that depends on significant page out can
prevent hibernation from succeeding. It's also Secure Boot compatible.
Where on linux with x86_64 it isn't.

Between kernel and ACPI and firmware bugs, it's going to take a lot
more effort to make it reliable and trustworthy for the general case.
Or it should just be abandoned, it seems to be mostly that way
already.

-- 
Chris Murphy




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux