On Sun, Jun 7, 2020 at 2:48 PM David Kaufmann <astra@xxxxxxxx> wrote: > > On Sat, Jun 06, 2020 at 05:36:15PM -0600, Chris Murphy wrote: > > To me this sounds like too much dependency on swap. > > That's not what I meant, I wanted to emphasize the different values of > disk storage vs. RAM. As said in another email it doesn't matter at all > if there is 0% or 90% of disk swap usage, while RAM usage can be quite > essential. (This is in case swapped out stuff stays swapped out.) Inactive pages that are evicted long term, is a workload that I think would benefit from zswap instead. In that case you get the benefit of the memory cache for recently used anonymous pages that would otherwise result in "swap thrashing" and the "least recently used" pages are moved to disk based swap. The inherent difficulty with optimizations, is trying to find a generic approach that helps most use cases. Is this a 100% winner? I doubt it. Is it an 80% winner across all of Fedora? I think it's at least that but sure, I can't prove it empirically. There's quite a lot of evidence it's sane considering all the use cases it's already been used in. > > > What people hate is slow swap. > > This is not generally true, only if RAM gets so tight that applications > start competing for swap. > This is why I've proposed test cases testing exactly that, as for > the case of persistent swap I'd expect the outcome to be a clear win for > disk swap. (Although this can in some cases also be seen as bug, as this > would be applications not really using the allocated space) I don't follow this. Where are the proposed test cases? And also in what case are you saying disk swap is a clear win? Because I would consider such an example an optimization for that specific edge case, rather than a generic solution. We've had that as a generic solution for a while and it's causing some grief for folks where there is memory competition among applications and those pages need to be evicted and then not long after paged in - which causes the swap thrashing effect. Arguably they need more memory for their workload. But that's in effect what the feature does. It gives them more bandwidth for frequently used anonymous pages being paged in and out via compressed memory rather than significantly slower disk swap. Is this free? Well, it's free in that it's not out of pocket cost for more RAM. Instead it exchanges some CPU to make extra room in existing memory and not have the explosively high latency of disk swap give them a bad experience. > > > For sure there is an impact on CPU. This exchanges IO bound work, for > > CPU and memory bound work. But it's pretty lightweight compression. > > > > And again, whatever is defined as "too much" CPU hit for the workload, > > necessarily translates into either "suffer the cost of IO bound > > swap-on-drive, or suffer the cost of more memory." There is no free > > lunch. It really isn't magic. > > Yes, that seems obvious to me. What would be interesting is the point, > where one is significantly slower than the other one. > The theoretical testcase is writing data to memory and reading it again. > For this case I'm assuming 8G RAM as total memory. > > Until about 95% mem usage I'd expect the disk swap case to win, as it > should behave the same as no swap (with matching swappiness values) Why would disk based swap win? In this example, where there's been no page outs, the zram device isn't using any memory. Again, it is not a preallocation. > At 150% memory usage assuming a 2:1 compression ratio this would mean: > - disk swap: > has to write 4G to disk initially, and for reading swap another 4G > (12G total traffic - 4G initial, 4G swapping out and 4G swapping in) > - zram, assuming 4G zram swap: > has to write 8G to zram initially, and for reading the data swap 16G > (24G total traffic - 8G initial, 8G swapping out and 8G swapping in) swap contains anonymous pages, so I'm not sure what you mean by initial. Whether these pages are internet or typed in or come from persistent storage - it's a wash between disk or zram swap so it can be ignored. Also I don't understand any of your math,how you start with a 4G zram swap but have 8G. I think you're confused. The cap of 4GiB is the device size. The actual amount of RAM it uses will be less due to compression. The zram device size is not the amount of memory used. And in no case is there a preallocation of memory unless the zram device is used. It is easy to get confused, by the way. That was my default state for days upon first stumbling on this. > It would be good to see actual numbers for this, so far I've only > seen praises on how well the compression ratio is. (Plus the anecdotal > references from a few people) There's a lot of use cases using this in the real world: Chrome, Android, Fedora IoT, Fedora ARM spins, most all of openQA VM's doing Anaconda installation tests are taking advantage of it. > But this should also be tested with actual CPUs and disks. I've been doing it for a year on four separate systems. I am not a scientific sample. But this is what I'm able to do. >zram is > obviously faster, but at which point is the overhead from compression, > the reduced unswapped memory and the doubled number of swapping operations > starting to be smaller than the overhead from SSD read/write speed? I have definitely seen behavior that sounds like this. That's the case of: 8G RAM + 8G swaponzram (i.e. 100% sized to RAM) versus 8G RAM + 8G swap on SSD And then compile webkitgtk using ninja default, which on my system is 10 jobs. The second one always becomes completely unresponsive and I do a forced power off at 30 minutes. (I have a few cases of 4+ hour compiles, none finished, some OOM. I have many, as in over 100, cases of forced power off varying from 1-30 minutes.) The first one, with zram, more often than not, ends with OOM inside of 30 minutes. I'd have to dig up hand written logs to see if there's any pattern how long it takes, I think it's around 10 minutes but human memory isn't exactly reliable so take this with a grain of salt. A smaller number of times, the system is in a "CPU+memory" based swap thrash. Approximately as you describe, it's probably just wedged in making very slow progress because perhaps up to 1/3 or 1/2 of RAM is being used for the zram device. And the compile flat out wants more memory than is available. This task only succeeds with ~12+G of disk based swap. Which is just not realistic. It's a clearly overcommitted and thus contrived test. But I love it and hate it at the same time. More realistic is to not use defaults, and set the number of jobs manually to 6. And in this case, zram based swap consistently beats disk based swap. Which makes sense because pretty much all of the inactive pages are going to be needed at some point by the compile or they are dropped. Following the compile there aren't a lot of inactive pages left, and I'm not sure they're even related to the compile at all. > Is this almost immediately the case or is this only closely before being > OOM anyway? > The "too much CPU" limit would be the actual wallclock time testprograms > take without hitting OOM. If a program using 120% memory takes 90 > seconds to complete its run with swap, and 60 seconds with zram swap, > that would be an improvement. If it's 120 seconds the most likely issue > is "too much CPU used for compression or swapping". Sure and I think any person is going to notice this kind of latency without even wall clock timing it. But anyway I time my compiles using the time command. > > There are worse things than OOM. Stuck with a totally unresponsive > > system and no OOM on the way. Hence earlyoom. And on-going resource > > control work with cgroupsv2 isolation. > > This is true boxes where the offending processes are not under manual > control, where it's better that any exploding program is being > terminated as soon as possible. Even under manual control we've got examples of the GUI becoming completely stuck. Long threads in devel@ based on this Workstation working group issue - with the same name. So just search archives for interactivity. Or maybe webkitgtk. #98 Better interactivity in low-memory situations https://pagure.io/fedora-workstation/issue/98 > It's exactly the other way round for manual controlled processes, as a > slowdown before getting to OOM is sometimes enough to be able to decide > what to free up/terminate, before OOM-Killer just goes in brute-force. > That doesn't work too well nowadays, as quite often the swap on disk > fills too fast on SSDs before I've got time to kill something. earlyoom will kill in such a case even if you can't. It's configurable and intentionally simplistic, based on memory and swap free percentage. -- Chris Murphy _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx