On Thursday, June 4, 2020 1:30:07 PM MST Ben Cotton wrote: > https://fedoraproject.org/wiki/Changes/SwapOnZRAM > > == Summary == > > Swap is useful, except when it's slow. zram is a RAM drive that uses > compression. Create a swap-on-zram during start-up. And no longer use > swap partitions by default. > > > == Owner == > * Name: [[User:chrismurphy| Chris Murphy]] > * Email: chrismurphy@xxxxxxxxxxxxxxxxx > > == Detailed Description == > > ==== zram Basic function ==== > > The zram† device, typically <span style=color:brown>/dev/zram0</span>, > has a size set at create time during early boot, by zram-generator† > per its configuration file. The memory used is not preallocated. It's > dynamically allocated and deallocated, on demand. Due to compression, > a full <span style=color:brown>/dev/zram0</span> uses half as much > memory as its size. > > The <span style=color:brown>/dev/zram0</span> behaves like any other > block device. It can be formatted with a file system, or mkswap, which > is the intention with this change proposal. > > The system will use RAM normally up until it's full, and then start > paging out to swap-on-zram, same as a conventional swap-on-drive. The > zram driver starts to allocate memory at roughly 1/2 the rate of page > outs, due to compression. But, there is no free lunch. This means > swap-on-zram is not as effective at page eviction as swap-on-drive, > the eviction rate is ~50% instead of 100%. But it is at least an order > of magnitude faster than drive based swap. > > zram has about 0.1% overhead or ~1MiB/1GiB. If the workload never > touches swap, this overhead is the sole cost. In practice when not > used at all, feature owner has experienced ~0.04% overhead. > > Example: A system has 16 GiB RAM. The proposed defaults suggest the > <span style=color:brown>/dev/zram0</span> device will be 4 GiB. If the > workload completely fills up swap with 4 GiB of anonymous pages, > what's happened? The <span style=color:red>zramctl</span> command will > display the true compression ratio. If 2:1 is really obtained, it > means 4GiB swap data is compressed to 2GiB. Therefore 2GiB is the > actual RAM usage, and is also the net effective eviction. i.e. 4 GiB > anonymous pages are evicted, but are then compressed and pinned into 2 > GiB RAM, for a net memory savings of 2 GiB. > > †</br > > [https://www.kernel.org/doc/Documentation/blockdev/zram.txt kernel.org > zram.txt] > [https://github.com/systemd/zram-generator Github zram-generator project] > > > ==== Overview of the Feature ==== > > Using swap is a good idea†, but no one likes it when it's slow. > Anaconda and Fedora IoT have been using swap-on-zram by default for > years. This builds on their prior effort. > > > There are three components to the change: > > # Install systemd rust-zram-generator† package. This does not enable > swap-on-zram, it only makes the generator available.</br > > # Install a default zram-generator configuration. When present, > swap-on-zram is set-up during startup.</br > > # Do not create swap partition/LV with default installations. > > This proposal aims to apply all three, for all Fedora editions and > spins, by default. > > It further aims to apply the first two, for upgrades and custom > installations. > It might be useful to only make the generator available (1), should an > edition/spin wish to opt out, or as a fallback if applying the feature > to upgrades fails to withstand scrutiny. > > †</br > > There is a tl;dr section at the top. Highly recommend reading the > whole article. [https://chrisdown.name/2018/01/02/in-defence-of-swap.html > In defence of swap: common misconceptions] > > > ==== Default zram device configuration: ==== > > During startup, create a zram device <span > style=color:brown>/dev/zram0</span>, with a size equal to 50% RAM, but > capped† to 4 GiB, and with a higher than typical swap priority†. > > These values seem reasonably conservative, and are based on prior work > in Fedora. Anaconda sets swap-on-drive sized to 50% RAM in the no > hibernation case, common outside x86. Fedora IoT's implementation also > sets swap-on-zram size to 50% RAM. > > †</br > > [https://github.com/systemd/zram-generator/issues/10 RFE: should be > able to set a cap on zram device size #10] > > [https://github.com/systemd/zram-generator/issues/8 RFE: should set priority > #8] > > ==== Default installer behavior ==== > > The installer is currently responsible for creating a swap-on-drive > device. This will be dropped. The zram-generator + configuration file > will trigger the setup and activation of swap-on-zram. This means > hibernation isn't possible, even on systems that could support it. > > Please see > [https://pagure.io/fedora-workstation/blob/master/f/hibernationstatus.md > Supporting hibernation in Workstation edition] for much more detailed > information, including why it's increasingly likely hibernation isn't > possible anyway, and a path to improving hibernation support. > > > ==== Custom/Advance partitioning installer behavior ==== > > The user can add swap using Custom partitioning at install time. This > is swap-on-drive. And the installer will also include the <span > style=color:red> resume=UUID </span> kernel parameter for this swap > device. No change in behavior here. > > Since swap-on-zram is still enabled by default, there will be two > swaps: swap-on-zram, and swap-on-drive. The swap-on-zram will have > higher priority, thus being favored over drive based swap. The kernel > is smart enough to know it can't hibernate to a zram device, and will > instead use drive based swap. > > > ==== How can it be disabled? ==== > > Immediately:</br > > <span style=color:red>swapoff /dev/zram0</span> > > Permanently:</br > > <span style=color:red>rm /etc/systemd/zram-generator.conf</span> > > > == Feedback == > > ==== You're enabling it on upgrades? ==== > > That's the current plan. As a technical matter, feature owner is > confident this feature will improve the experience of all users > regardless of configuration. As a non-technical matter, it's > recognized that (a) ''hey pal, you're messing with my customizations, > not cool!'' and (b) ''swap always stinks, I don't care if it has a 'Z' > in the name!'' may need more convincing. > > There are possible risks. > > * Workloads that expect full use of memory, and depend on 100% page > eviction. These may run slower if they really need full use of memory, > but some memory is used for the zram device instead. Such workloads > might favor zswap. > > * Workloads with low compressible pages. In the worst case, this means > unnecessary work merely moving pages around. > > * Workloads with memory full, and hibernation. Hibernation is already > stressful to memory-management subsystem and prone to bailing out in > such cases. The swap-on-zram will be favored for evictions in the > attempt to free memory to create the hibernation image. It could > increase instances of hibernation entry failure. This isn't a crash, > it just means the attempt doesn't succeed, and the system resumes > operation instead of hibernating. > > While possible, it's difficult to estimate their probability. But this > is a significant consideration in the conservative default zram size. > Users can easily increase zram size as needed for their use case, > simply by editing <span > style=color:red>/etc/systemd/zram-generator.conf</span> and the change > takes effect at next boot. > > ==== Why systemd zram-generator? ==== > > It's the most upstream implementation to date, is fast and > lightweight. The zram-generator uses existing systemd infrastructure > to setup the zram block device, format it as swap, and swapon - all > during early boot. It's very similar in behavior to fstab-generator, > gpt-auto-generator, and cryptsetup-generator†. > > Converging on one implementation avoids user confusion. And while the > alternatives are nice and work fine, a systemd generator is > particularly well suited for this use case compared to a systemd > service unit.† > > Also, it's an reference implementation of a system generator written in > Rust. > †</br > > [https://www.freedesktop.org/software/systemd/man/systemd.generator.html > freedesktop.org About systemd generators.]</br > > [https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx > /message/TCY534JPIMZ3OXM5Q5E2ZH5PSAKQNGP7/ devel@ ''Re: swap-on-zram by > default'' Zbigniew Jędrzejewski-Szmek, systemd zram-generator > author/maintainer] > > > ==== Why not a bigger zram device? ==== > > The main idea of being conservative is to address concerns about > upgrades. It's possible some workloads will have less compressible > data. Hence, not going with <span style=color:brown>/dev/zram0</span> > sized to 100% of RAM at this time. Even a <span > style=color:brown>/dev/zram0</span> of 200% RAM is not unreasonable > *if* the compression ratio is at least 2:1. However, it's possible a > system can get "stuck" in a kind of swap thrashing similar to > conventional swap-on-drive, except it's CPU and memory bound, rather > than IO bound. Feature owner thinks it's better to just oom, instead > of getting overly aggressive with the zram device size. > > Conversely it's possible to be too conservative with the size, and > result in more instances of OOM kill. If applying the feature to > upgrades is rejected, it's probably reasonable to increase the cap to > ~8GiB. Of course more feedback and testing is needed, and it will be > taken into consideration. > > Note that the kernel zram doc says an excessively sized zram device > does come with overhead. Users's can increase the size easily > post-install, a capability they don't easily have with swap-on-drive. > The goal for Fedora 33 is a default that's useful and safe for the > vast majority of use cases. > > > ==== Why not zswap? ==== > > Zswap† is a similar idea, but with a totally different implementation. > It is swap specific, uses a RAM cache, and requires a conventional > swap partition existing already. It might be true certain workloads > are better suited for using zswap. But swap-on-zram depends only on > volatile storage. This is simpler and it's more secure. Whereas zswap > "spills over" into swap-on-drive and will leak user data if that swap > device isn't encrypted. Some workloads may do better with zswap, and > it's a valid future feature for a new generator, or possibly extend > zram-generator to support it via the configuration file. Maybe the > generator could favor zswap when swap-on-drive already exists; and > fallback to swap-on-zram? > > †</br > > [https://www.kernel.org/doc/Documentation/vm/zswap.txt kernel.org > zswap.txt] > > == Benefit to Fedora == > > * significantly improves system responsiveness, especially when swap > is under pressure; > * more secure, user data leaks into swap are on volatile media; > * without swap-on-drive, there's better utilization of a limited > resource: benefit of swap without the drive space consumption; > * complements on-going resource control work, including earlyoom; > * further reduces the time to out-of-memory kill, when workloads exceed > limits; * improves performance for both "no swap" and "existing swap" > setups; > > > == Scope == > > * Proposal owners: > ** add zram-generator package to comps and kickstarts as appropriate > ** obsolete zram package (used by Fedora IoT) > ** means of per edition/spin configurations, if needed > ** coordinate a test day > > * Other developers: > **Anaconda are agreeable to deprecating their built-in implementation > in favor of swap-on-zram > **RFE's for zram-generator: users are not worse off if they don't > happen. Open request for help, to make it possible. It's much > appreciated.</br > > [https://github.com/systemd/zram-generator/issues/10 RFE: should be > able to set a cap on zram device size #10]</br > > [https://github.com/systemd/zram-generator/issues/8 RFE: should set priority > #8] > * Release engineering: [https://pagure.io/releng/issues #9495] > > * Policies and guidelines: N/A > > * Trademark approval: N/A > > > == Upgrade/compatibility impact == > > Add Supplements:fedora-release-common to zram-generator to pull it in > on upgrades. > > Existing systems without swap will have swap-on-zram enabled. > > Existing systems with swap-on-drive, will also have swap-on-zram > enabled (two swap devices), with higher priority for the zram device. > Existing swap-on-drive will not be removed. > > 'zram' package contains zram-swap.service and associated bash scripts, > and is currently used by Fedora IoT and ARM spins. It will be > obsoleted to avoid conflicting/duplicative swap-on-zram > implementations. > > > == How To Test == > > Any hardware. Any version of Fedora. > > # dnf install zram-generator > # cp /usr/share/doc/zram-generator/zram-generator.conf.example > /etc/systemd/zram-generator.conf > # Edit the configuration > # Reboot > # Check that swap is on a zram device: zramctl, swapon > # Detailed check: journalctl -b -o short-monotonic | grep 'swap\|zram' > # Check that priority is higher than existing swap if two or more are > listed. ## (Enhancement is needed for this.) > > Suggested configuration file values:</br > > <span style=color:red>[zram0]</span></br > > <span style=color:red>memory-limit = none</span></br > > <span style=color:red>zram-fraction = 0.5</span></br > > > Feel free to run your usual workloads more aggressively or in > parallel. Suspend-to-RAM and suspend-to-drive are expected to continue > to work too (or at least hit all the same bugs as without zram being > used). > > Also, you can see the actual compression ratio achieved with the > following command:</br > > <span style=color:red> zramctl </span> > > > ==== Test Day ==== > > [https://pagure.io/fedora-qa/issue/632 QA: SwapOnzram Test Day] to > discover edge cases, and tweak the default configuration if necessary > to establish a good one-size-fits all approach. > > > == User Experience == > > The user won't notice anything displeasing. If their usual workload > causes them to dread swap thrashing, they'll be surprised that > thrashing doesn't happen. The user might get curious if they don't > find a swap entry in /etc/fstab. Or if they 'swapon' and see swap > pointing to <span style=color:brown>/dev/zram0</span> instead of a > drive partition or LV. > > > == Dependencies == > > N/A > > > == Contingency Plan == > > * Contingency mechanism: Don't ship the generator = big hammer, but > easy. Preferable to ship the generator, but only selectively ship > configuration files = scalpel, pretty easy. > * Contingency deadline: Beta freeze > * Blocks release? No. > * Blocks product? No. > > > == Documentation == > > Consider adding a hint in an /etc/fstab comment? There is no man page > for this, and the documentation is also minimal, besides what's in > this feature proposal. It's an open question how the user should get > more information on how to configure and tweak it. But then, they > don't have that for swap today either. There's just institutional > knowledge. > > Hence, a strong test day, with a lot of people and press coverage of > the feature, might help spread the word for institutional knowledge > changes coming. > > Ideas welcome. > > > == Release Notes == > Pending feedback and test day. Jesus Christ, this actually got approved. It's time to fork Fedora. This is really getting out of hand. -- John M. Harris, Jr. _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx