Re: Fedora 33 System-Wide Change proposal: swap on zram

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thursday, June 4, 2020 1:30:07 PM MST Ben Cotton wrote:
> https://fedoraproject.org/wiki/Changes/SwapOnZRAM
> 
> == Summary ==
> 
> Swap is useful, except when it's slow. zram is a RAM drive that uses
> compression. Create a swap-on-zram during start-up. And no longer use
> swap partitions by default.
> 
> 
> == Owner ==
> * Name: [[User:chrismurphy| Chris Murphy]]
> * Email: chrismurphy@xxxxxxxxxxxxxxxxx
> 
> == Detailed Description ==
> 
> ==== zram Basic function ====
> 
> The zram† device, typically <span style=color:brown>/dev/zram0</span>,
> has a size set at create time during early boot, by zram-generator†
> per its configuration file. The memory used is not preallocated. It's
> dynamically allocated and deallocated, on demand. Due to compression,
> a full <span style=color:brown>/dev/zram0</span> uses half as much
> memory as its size.
> 
> The <span style=color:brown>/dev/zram0</span> behaves like any other
> block device. It can be formatted with a file system, or mkswap, which
> is the intention with this change proposal.
> 
> The system will use RAM normally up until it's full, and then start
> paging out to swap-on-zram, same as a conventional swap-on-drive. The
> zram driver starts to allocate memory at roughly 1/2 the rate of page
> outs, due to compression. But, there is no free lunch. This means
> swap-on-zram is not as effective at page eviction as swap-on-drive,
> the eviction rate is ~50% instead of 100%. But it is at least an order
> of magnitude faster than drive based swap.
> 
> zram has about 0.1% overhead or ~1MiB/1GiB. If the workload never
> touches swap, this overhead is the sole cost. In practice when not
> used at all, feature owner has experienced ~0.04% overhead.
> 
> Example: A system has 16 GiB RAM. The proposed defaults suggest the
> <span style=color:brown>/dev/zram0</span> device will be 4 GiB. If the
> workload completely fills up swap with 4 GiB of anonymous pages,
> what's happened? The <span style=color:red>zramctl</span> command will
> display the true compression ratio. If 2:1 is really obtained, it
> means 4GiB swap data is compressed to 2GiB. Therefore 2GiB is the
> actual RAM usage, and is also the net effective eviction. i.e. 4 GiB
> anonymous pages are evicted, but are then compressed and pinned into 2
> GiB RAM, for a net memory savings of 2 GiB.
> 
> †</br >
> [https://www.kernel.org/doc/Documentation/blockdev/zram.txt kernel.org
> zram.txt]
 
> [https://github.com/systemd/zram-generator Github zram-generator project]
> 
> 
> ==== Overview of the Feature ====
> 
> Using swap is a good idea†, but no one likes it when it's slow.
> Anaconda and Fedora IoT have been using swap-on-zram by default for
> years. This builds on their prior effort.
> 
> 
> There are three components to the change:
> 
> # Install systemd rust-zram-generator† package. This does not enable
> swap-on-zram, it only makes the generator available.</br >
> # Install a default zram-generator configuration. When present,
> swap-on-zram is set-up during startup.</br >
> # Do not create swap partition/LV with default installations.
> 
> This proposal aims to apply all three, for all Fedora editions and
> spins, by default.
> 
> It further aims to apply the first two, for upgrades and custom
> installations.
 
> It might be useful to only make the generator available (1), should an
> edition/spin wish to opt out, or as a fallback if applying the feature
> to upgrades fails to withstand scrutiny.
> 
> †</br >
> There is a tl;dr section at the top. Highly recommend reading the
> whole article. [https://chrisdown.name/2018/01/02/in-defence-of-swap.html
> In defence of swap: common misconceptions]
> 
> 
> ==== Default zram device configuration: ====
> 
> During startup, create a zram device <span
> style=color:brown>/dev/zram0</span>, with a size equal to 50% RAM, but
> capped† to 4 GiB, and with a higher than typical swap priority†.
> 
> These values seem reasonably conservative, and are based on prior work
> in Fedora. Anaconda sets swap-on-drive sized to 50% RAM in the no
> hibernation case, common outside x86. Fedora IoT's implementation also
> sets swap-on-zram size to 50% RAM.
> 
> †</br >
> [https://github.com/systemd/zram-generator/issues/10 RFE: should be
> able to set a cap on zram device size #10]
> 
> [https://github.com/systemd/zram-generator/issues/8 RFE: should set priority
> #8]
 
> 
> ==== Default installer behavior  ====
> 
> The installer is currently responsible for creating a swap-on-drive
> device. This will be dropped. The zram-generator + configuration file
> will trigger the setup and activation of swap-on-zram. This means
> hibernation isn't possible, even on systems that could support it.
> 
> Please see
> [https://pagure.io/fedora-workstation/blob/master/f/hibernationstatus.md
> Supporting hibernation in Workstation edition] for much more detailed
> information, including why it's increasingly likely hibernation isn't
> possible anyway, and a path to improving hibernation support.
> 
> 
> ==== Custom/Advance partitioning installer behavior ====
> 
> The user can add swap using Custom partitioning at install time. This
> is swap-on-drive. And the installer will also include the <span
> style=color:red> resume=UUID </span> kernel parameter for this swap
> device. No change in behavior here.
> 
> Since swap-on-zram is still enabled by default, there will be two
> swaps: swap-on-zram, and swap-on-drive. The swap-on-zram will have
> higher priority, thus being favored over drive based swap. The kernel
> is smart enough to know it can't hibernate to a zram device, and will
> instead use drive based swap.
> 
> 
> ==== How can it be disabled? ====
> 
> Immediately:</br >
> <span style=color:red>swapoff /dev/zram0</span>
> 
> Permanently:</br >
> <span style=color:red>rm /etc/systemd/zram-generator.conf</span>
> 
> 
> == Feedback ==
> 
> ==== You're enabling it on upgrades? ====
> 
> That's the current plan. As a technical matter, feature owner is
> confident this feature will improve the experience of all users
> regardless of configuration. As a non-technical matter, it's
> recognized that (a) ''hey pal, you're messing with my customizations,
> not cool!'' and (b) ''swap always stinks, I don't care if it has a 'Z'
> in the name!'' may need more convincing.
> 
> There are possible risks.
> 
> * Workloads that expect full use of memory, and depend on 100% page
> eviction. These may run slower if they really need full use of memory,
> but some memory is used for the zram device instead. Such workloads
> might favor zswap.
> 
> * Workloads with low compressible pages. In the worst case, this means
> unnecessary work merely moving pages around.
> 
> * Workloads with memory full, and hibernation. Hibernation is already
> stressful to memory-management subsystem and prone to bailing out in
> such cases. The swap-on-zram will be favored for evictions in the
> attempt to free memory to create the hibernation image. It could
> increase instances of hibernation entry failure. This isn't a crash,
> it just means the attempt doesn't succeed, and the system resumes
> operation instead of hibernating.
> 
> While possible, it's difficult to estimate their probability. But this
> is a significant consideration in the conservative default zram size.
> Users can easily increase zram size as needed for their use case,
> simply by editing <span
> style=color:red>/etc/systemd/zram-generator.conf</span> and the change
> takes effect at next boot.
> 
> ==== Why systemd zram-generator? ====
> 
> It's the most upstream implementation to date, is fast and
> lightweight. The zram-generator uses existing systemd infrastructure
> to setup the zram block device, format it as swap, and swapon - all
> during early boot. It's very similar in behavior to fstab-generator,
> gpt-auto-generator, and cryptsetup-generator†.
> 
> Converging on one implementation avoids user confusion. And while the
> alternatives are nice and work fine, a systemd generator is
> particularly well suited for this use case compared to a systemd
> service unit.†
> 
> Also, it's an reference implementation of a system generator written in
> Rust.
 
> †</br >
> [https://www.freedesktop.org/software/systemd/man/systemd.generator.html
> freedesktop.org About systemd generators.]</br >
> [https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
> /message/TCY534JPIMZ3OXM5Q5E2ZH5PSAKQNGP7/
 devel@ ''Re: swap-on-zram by
> default'' Zbigniew Jędrzejewski-Szmek, systemd zram-generator
> author/maintainer]
> 
> 
> ==== Why not a bigger zram device? ====
> 
> The main idea of being conservative is to address concerns about
> upgrades. It's possible some workloads will have less compressible
> data. Hence, not going with <span style=color:brown>/dev/zram0</span>
> sized to 100% of RAM at this time. Even a <span
> style=color:brown>/dev/zram0</span> of 200% RAM is not unreasonable
> *if* the compression ratio is at least 2:1. However, it's possible a
> system can get "stuck" in a kind of swap thrashing similar to
> conventional swap-on-drive, except it's CPU and memory bound, rather
> than IO bound. Feature owner thinks it's better to just oom, instead
> of getting overly aggressive with the zram device size.
> 
> Conversely it's possible to be too conservative with the size, and
> result in more instances of OOM kill. If applying the feature to
> upgrades is rejected, it's probably reasonable to increase the cap to
> ~8GiB. Of course more feedback and testing is needed, and it will be
> taken into consideration.
> 
> Note that the kernel zram doc says an excessively sized zram device
> does come with overhead. Users's can increase the size easily
> post-install, a capability they don't easily have with swap-on-drive.
> The goal for Fedora 33 is a default that's useful and safe for the
> vast majority of use cases.
> 
> 
> ==== Why not zswap? ====
> 
> Zswap† is a similar idea, but with a totally different implementation.
> It is swap specific, uses a RAM cache, and requires a conventional
> swap partition existing already. It might be true certain workloads
> are better suited for using zswap. But swap-on-zram depends only on
> volatile storage. This is simpler and it's more secure. Whereas zswap
> "spills over" into swap-on-drive and will leak user data if that swap
> device isn't encrypted. Some workloads may do better with zswap, and
> it's a valid future feature for a new generator, or possibly extend
> zram-generator to support it via the configuration file. Maybe the
> generator could favor zswap when swap-on-drive already exists; and
> fallback to swap-on-zram?
> 
> †</br >
> [https://www.kernel.org/doc/Documentation/vm/zswap.txt kernel.org
> zswap.txt]
 
> 
> == Benefit to Fedora ==
> 
> * significantly improves system responsiveness, especially when swap
> is under pressure;
> * more secure, user data leaks into swap are on volatile media;
> * without swap-on-drive, there's better utilization of a limited
> resource: benefit of swap without the drive space consumption;
> * complements on-going resource control work, including earlyoom;
> * further reduces the time to out-of-memory kill, when workloads exceed
> limits;
 * improves performance for both "no swap" and "existing swap"
> setups; 
> 
> 
> == Scope ==
> 
> * Proposal owners:
> ** add zram-generator package to comps and kickstarts as appropriate
> ** obsolete zram package (used by Fedora IoT)
> ** means of per edition/spin configurations, if needed
> ** coordinate a test day
> 
> * Other developers:
> **Anaconda are agreeable to deprecating their built-in implementation
> in favor of swap-on-zram
> **RFE's for zram-generator: users are not worse off if they don't
> happen. Open request for help, to make it possible. It's much
> appreciated.</br >
> [https://github.com/systemd/zram-generator/issues/10 RFE: should be
> able to set a cap on zram device size #10]</br >
> [https://github.com/systemd/zram-generator/issues/8 RFE: should set priority
> #8]
 
> * Release engineering: [https://pagure.io/releng/issues #9495]
> 
> * Policies and guidelines: N/A
> 
> * Trademark approval: N/A
> 
> 
> == Upgrade/compatibility impact ==
> 
> Add Supplements:fedora-release-common to zram-generator to pull it in
> on upgrades.
> 
> Existing systems without swap will have swap-on-zram enabled.
> 
> Existing systems with swap-on-drive, will also have swap-on-zram
> enabled (two swap devices), with higher priority for the zram device.
> Existing swap-on-drive will not be removed.
> 
> 'zram' package contains zram-swap.service and associated bash scripts,
> and is currently used by Fedora IoT and ARM spins. It will be
> obsoleted to avoid conflicting/duplicative swap-on-zram
> implementations.
> 
> 
> == How To Test ==
> 
> Any hardware. Any version of Fedora.
> 
> # dnf install zram-generator
> # cp /usr/share/doc/zram-generator/zram-generator.conf.example
> /etc/systemd/zram-generator.conf
> # Edit the configuration
> # Reboot
> # Check that swap is on a zram device: zramctl, swapon
> # Detailed check: journalctl -b -o short-monotonic | grep 'swap\|zram'
> # Check that priority is higher than existing swap if two or more are
> listed. ## (Enhancement is needed for this.)
> 
> Suggested configuration file values:</br >
> <span style=color:red>[zram0]</span></br >
> <span style=color:red>memory-limit = none</span></br >
> <span style=color:red>zram-fraction = 0.5</span></br >
> 
> Feel free to run your usual workloads more aggressively or in
> parallel. Suspend-to-RAM and suspend-to-drive are expected to continue
> to work too (or at least hit all the same bugs as without zram being
> used).
> 
> Also, you can see the actual compression ratio achieved with the
> following command:</br >
> <span style=color:red> zramctl </span>
> 
> 
> ==== Test Day ====
> 
> [https://pagure.io/fedora-qa/issue/632 QA: SwapOnzram Test Day] to
> discover edge cases, and tweak the default configuration if necessary
> to establish a good one-size-fits all approach.
> 
> 
> == User Experience ==
> 
> The user won't notice anything displeasing. If their usual workload
> causes them to dread swap thrashing, they'll be surprised that
> thrashing doesn't happen. The user might get curious if they don't
> find a swap entry in /etc/fstab. Or if they 'swapon' and see swap
> pointing to <span style=color:brown>/dev/zram0</span> instead of a
> drive partition or LV.
> 
> 
> == Dependencies ==
> 
> N/A
> 
> 
> == Contingency Plan ==
> 
> * Contingency mechanism: Don't ship the generator = big hammer, but
> easy. Preferable to ship the generator, but only selectively ship
> configuration files = scalpel, pretty easy.
> * Contingency deadline: Beta freeze
> * Blocks release? No.
> * Blocks product? No.
> 
> 
> == Documentation ==
> 
> Consider adding a hint in an /etc/fstab comment? There is no man page
> for this, and the documentation is also minimal, besides what's in
> this feature proposal. It's an open question how the user should get
> more information on how to configure and tweak it. But then, they
> don't have that for swap today either. There's just institutional
> knowledge.
> 
> Hence, a strong test day, with a lot of people and press coverage of
> the feature, might help spread the word for institutional knowledge
> changes coming.
> 
> Ideas welcome.
> 
> 
> == Release Notes ==
> Pending feedback and test day.

Jesus Christ, this actually got approved. It's time to fork Fedora. This is 
really getting out of hand.

-- 
John M. Harris, Jr.

_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux