Fedora 33 System-Wide Change proposal: swap on zram

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://fedoraproject.org/wiki/Changes/SwapOnZRAM

== Summary ==

Swap is useful, except when it's slow. zram is a RAM drive that uses
compression. Create a swap-on-zram during start-up. And no longer use
swap partitions by default.


== Owner ==
* Name: [[User:chrismurphy| Chris Murphy]]
* Email: chrismurphy@xxxxxxxxxxxxxxxxx

== Detailed Description ==

==== zram Basic function ====

The zram† device, typically <span style=color:brown>/dev/zram0</span>,
has a size set at create time during early boot, by zram-generator†
per its configuration file. The memory used is not preallocated. It's
dynamically allocated and deallocated, on demand. Due to compression,
a full <span style=color:brown>/dev/zram0</span> uses half as much
memory as its size.

The <span style=color:brown>/dev/zram0</span> behaves like any other
block device. It can be formatted with a file system, or mkswap, which
is the intention with this change proposal.

The system will use RAM normally up until it's full, and then start
paging out to swap-on-zram, same as a conventional swap-on-drive. The
zram driver starts to allocate memory at roughly 1/2 the rate of page
outs, due to compression. But, there is no free lunch. This means
swap-on-zram is not as effective at page eviction as swap-on-drive,
the eviction rate is ~50% instead of 100%. But it is at least an order
of magnitude faster than drive based swap.

zram has about 0.1% overhead or ~1MiB/1GiB. If the workload never
touches swap, this overhead is the sole cost. In practice when not
used at all, feature owner has experienced ~0.04% overhead.

Example: A system has 16 GiB RAM. The proposed defaults suggest the
<span style=color:brown>/dev/zram0</span> device will be 4 GiB. If the
workload completely fills up swap with 4 GiB of anonymous pages,
what's happened? The <span style=color:red>zramctl</span> command will
display the true compression ratio. If 2:1 is really obtained, it
means 4GiB swap data is compressed to 2GiB. Therefore 2GiB is the
actual RAM usage, and is also the net effective eviction. i.e. 4 GiB
anonymous pages are evicted, but are then compressed and pinned into 2
GiB RAM, for a net memory savings of 2 GiB.

†</br >
[https://www.kernel.org/doc/Documentation/blockdev/zram.txt kernel.org zram.txt]

[https://github.com/systemd/zram-generator Github zram-generator project]


==== Overview of the Feature ====

Using swap is a good idea†, but no one likes it when it's slow.
Anaconda and Fedora IoT have been using swap-on-zram by default for
years. This builds on their prior effort.


There are three components to the change:

# Install systemd rust-zram-generator† package. This does not enable
swap-on-zram, it only makes the generator available.</br >
# Install a default zram-generator configuration. When present,
swap-on-zram is set-up during startup.</br >
# Do not create swap partition/LV with default installations.

This proposal aims to apply all three, for all Fedora editions and
spins, by default.

It further aims to apply the first two, for upgrades and custom installations.

It might be useful to only make the generator available (1), should an
edition/spin wish to opt out, or as a fallback if applying the feature
to upgrades fails to withstand scrutiny.

†</br >
There is a tl;dr section at the top. Highly recommend reading the
whole article. [https://chrisdown.name/2018/01/02/in-defence-of-swap.html
In defence of swap: common misconceptions]


==== Default zram device configuration: ====

During startup, create a zram device <span
style=color:brown>/dev/zram0</span>, with a size equal to 50% RAM, but
capped† to 4 GiB, and with a higher than typical swap priority†.

These values seem reasonably conservative, and are based on prior work
in Fedora. Anaconda sets swap-on-drive sized to 50% RAM in the no
hibernation case, common outside x86. Fedora IoT's implementation also
sets swap-on-zram size to 50% RAM.

†</br >
[https://github.com/systemd/zram-generator/issues/10 RFE: should be
able to set a cap on zram device size #10]

[https://github.com/systemd/zram-generator/issues/8 RFE: should set priority #8]


==== Default installer behavior  ====

The installer is currently responsible for creating a swap-on-drive
device. This will be dropped. The zram-generator + configuration file
will trigger the setup and activation of swap-on-zram. This means
hibernation isn't possible, even on systems that could support it.

Please see [https://pagure.io/fedora-workstation/blob/master/f/hibernationstatus.md
Supporting hibernation in Workstation edition] for much more detailed
information, including why it's increasingly likely hibernation isn't
possible anyway, and a path to improving hibernation support.


==== Custom/Advance partitioning installer behavior ====

The user can add swap using Custom partitioning at install time. This
is swap-on-drive. And the installer will also include the <span
style=color:red> resume=UUID </span> kernel parameter for this swap
device. No change in behavior here.

Since swap-on-zram is still enabled by default, there will be two
swaps: swap-on-zram, and swap-on-drive. The swap-on-zram will have
higher priority, thus being favored over drive based swap. The kernel
is smart enough to know it can't hibernate to a zram device, and will
instead use drive based swap.


==== How can it be disabled? ====

Immediately:</br >
<span style=color:red>swapoff /dev/zram0</span>

Permanently:</br >
<span style=color:red>rm /etc/systemd/zram-generator.conf</span>


== Feedback ==

==== You're enabling it on upgrades? ====

That's the current plan. As a technical matter, feature owner is
confident this feature will improve the experience of all users
regardless of configuration. As a non-technical matter, it's
recognized that (a) ''hey pal, you're messing with my customizations,
not cool!'' and (b) ''swap always stinks, I don't care if it has a 'Z'
in the name!'' may need more convincing.

There are possible risks.

* Workloads that expect full use of memory, and depend on 100% page
eviction. These may run slower if they really need full use of memory,
but some memory is used for the zram device instead. Such workloads
might favor zswap.

* Workloads with low compressible pages. In the worst case, this means
unnecessary work merely moving pages around.

* Workloads with memory full, and hibernation. Hibernation is already
stressful to memory-management subsystem and prone to bailing out in
such cases. The swap-on-zram will be favored for evictions in the
attempt to free memory to create the hibernation image. It could
increase instances of hibernation entry failure. This isn't a crash,
it just means the attempt doesn't succeed, and the system resumes
operation instead of hibernating.

While possible, it's difficult to estimate their probability. But this
is a significant consideration in the conservative default zram size.
Users can easily increase zram size as needed for their use case,
simply by editing <span
style=color:red>/etc/systemd/zram-generator.conf</span> and the change
takes effect at next boot.

==== Why systemd zram-generator? ====

It's the most upstream implementation to date, is fast and
lightweight. The zram-generator uses existing systemd infrastructure
to setup the zram block device, format it as swap, and swapon - all
during early boot. It's very similar in behavior to fstab-generator,
gpt-auto-generator, and cryptsetup-generator†.

Converging on one implementation avoids user confusion. And while the
alternatives are nice and work fine, a systemd generator is
particularly well suited for this use case compared to a systemd
service unit.†

Also, it's an reference implementation of a system generator written in Rust.

†</br >
[https://www.freedesktop.org/software/systemd/man/systemd.generator.html
freedesktop.org About systemd generators.]</br >
[https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx/message/TCY534JPIMZ3OXM5Q5E2ZH5PSAKQNGP7/
devel@ ''Re: swap-on-zram by default'' Zbigniew Jędrzejewski-Szmek,
systemd zram-generator author/maintainer]


==== Why not a bigger zram device? ====

The main idea of being conservative is to address concerns about
upgrades. It's possible some workloads will have less compressible
data. Hence, not going with <span style=color:brown>/dev/zram0</span>
sized to 100% of RAM at this time. Even a <span
style=color:brown>/dev/zram0</span> of 200% RAM is not unreasonable
*if* the compression ratio is at least 2:1. However, it's possible a
system can get "stuck" in a kind of swap thrashing similar to
conventional swap-on-drive, except it's CPU and memory bound, rather
than IO bound. Feature owner thinks it's better to just oom, instead
of getting overly aggressive with the zram device size.

Conversely it's possible to be too conservative with the size, and
result in more instances of OOM kill. If applying the feature to
upgrades is rejected, it's probably reasonable to increase the cap to
~8GiB. Of course more feedback and testing is needed, and it will be
taken into consideration.

Note that the kernel zram doc says an excessively sized zram device
does come with overhead. Users's can increase the size easily
post-install, a capability they don't easily have with swap-on-drive.
The goal for Fedora 33 is a default that's useful and safe for the
vast majority of use cases.


==== Why not zswap? ====

Zswap† is a similar idea, but with a totally different implementation.
It is swap specific, uses a RAM cache, and requires a conventional
swap partition existing already. It might be true certain workloads
are better suited for using zswap. But swap-on-zram depends only on
volatile storage. This is simpler and it's more secure. Whereas zswap
"spills over" into swap-on-drive and will leak user data if that swap
device isn't encrypted. Some workloads may do better with zswap, and
it's a valid future feature for a new generator, or possibly extend
zram-generator to support it via the configuration file. Maybe the
generator could favor zswap when swap-on-drive already exists; and
fallback to swap-on-zram?

†</br >
[https://www.kernel.org/doc/Documentation/vm/zswap.txt kernel.org zswap.txt]


== Benefit to Fedora ==

* significantly improves system responsiveness, especially when swap
is under pressure;
* more secure, user data leaks into swap are on volatile media;
* without swap-on-drive, there's better utilization of a limited
resource: benefit of swap without the drive space consumption;
* complements on-going resource control work, including earlyoom;
* further reduces the time to out-of-memory kill, when workloads exceed limits;
* improves performance for both "no swap" and "existing swap" setups;



== Scope ==

* Proposal owners:
** add zram-generator package to comps and kickstarts as appropriate
** obsolete zram package (used by Fedora IoT)
** means of per edition/spin configurations, if needed
** coordinate a test day

* Other developers:
**Anaconda are agreeable to deprecating their built-in implementation
in favor of swap-on-zram
**RFE's for zram-generator: users are not worse off if they don't
happen. Open request for help, to make it possible. It's much
appreciated.</br >
[https://github.com/systemd/zram-generator/issues/10 RFE: should be
able to set a cap on zram device size #10]</br >
[https://github.com/systemd/zram-generator/issues/8 RFE: should set priority #8]

* Release engineering: [https://pagure.io/releng/issues #9495]

* Policies and guidelines: N/A

* Trademark approval: N/A


== Upgrade/compatibility impact ==

Add Supplements:fedora-release-common to zram-generator to pull it in
on upgrades.

Existing systems without swap will have swap-on-zram enabled.

Existing systems with swap-on-drive, will also have swap-on-zram
enabled (two swap devices), with higher priority for the zram device.
Existing swap-on-drive will not be removed.

'zram' package contains zram-swap.service and associated bash scripts,
and is currently used by Fedora IoT and ARM spins. It will be
obsoleted to avoid conflicting/duplicative swap-on-zram
implementations.


== How To Test ==

Any hardware. Any version of Fedora.

# dnf install zram-generator
# cp /usr/share/doc/zram-generator/zram-generator.conf.example
/etc/systemd/zram-generator.conf
# Edit the configuration
# Reboot
# Check that swap is on a zram device: zramctl, swapon
# Detailed check: journalctl -b -o short-monotonic | grep 'swap\|zram'
# Check that priority is higher than existing swap if two or more are
listed. ## (Enhancement is needed for this.)

Suggested configuration file values:</br >
<span style=color:red>[zram0]</span></br >
<span style=color:red>memory-limit = none</span></br >
<span style=color:red>zram-fraction = 0.5</span></br >

Feel free to run your usual workloads more aggressively or in
parallel. Suspend-to-RAM and suspend-to-drive are expected to continue
to work too (or at least hit all the same bugs as without zram being
used).

Also, you can see the actual compression ratio achieved with the
following command:</br >
<span style=color:red> zramctl </span>


==== Test Day ====

[https://pagure.io/fedora-qa/issue/632 QA: SwapOnzram Test Day] to
discover edge cases, and tweak the default configuration if necessary
to establish a good one-size-fits all approach.


== User Experience ==

The user won't notice anything displeasing. If their usual workload
causes them to dread swap thrashing, they'll be surprised that
thrashing doesn't happen. The user might get curious if they don't
find a swap entry in /etc/fstab. Or if they 'swapon' and see swap
pointing to <span style=color:brown>/dev/zram0</span> instead of a
drive partition or LV.


== Dependencies ==

N/A


== Contingency Plan ==

* Contingency mechanism: Don't ship the generator = big hammer, but
easy. Preferable to ship the generator, but only selectively ship
configuration files = scalpel, pretty easy.
* Contingency deadline: Beta freeze
* Blocks release? No.
* Blocks product? No.


== Documentation ==

Consider adding a hint in an /etc/fstab comment? There is no man page
for this, and the documentation is also minimal, besides what's in
this feature proposal. It's an open question how the user should get
more information on how to configure and tweak it. But then, they
don't have that for swap today either. There's just institutional
knowledge.

Hence, a strong test day, with a lot of people and press coverage of
the feature, might help spread the word for institutional knowledge
changes coming.

Ideas welcome.


== Release Notes ==
Pending feedback and test day.


-- 
Ben Cotton
He / Him / His
Senior Program Manager, Fedora & CentOS Stream
Red Hat
TZ=America/Indiana/Indianapolis
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux