On Wed, Dec 20, 2023 at 6:50 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Wed, Dec 20, 2023 at 12:59:15AM -0800, Yosry Ahmed wrote: > > On Tue, Dec 19, 2023 at 9:15 PM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > > > > > On Mon, Dec 18, 2023 at 01:52:23PM -0800, Yosry Ahmed wrote: > > > > > > Taking a step back from all the memory.swap.tiers vs. > > > > > > memory.zswap.writeback discussions, I think there may be a more > > > > > > fundamental problem here. If the zswap store failure is recurrent, > > > > > > pages can keep going back to the LRUs and then sent back to zswap > > > > > > eventually, only to be rejected again. For example, this can if zswap > > > > > > is above the acceptance threshold, but could be even worse if it's the > > > > > > allocator rejecting the page due to not compressing well enough. In > > > > > > the latter case, the page can keep going back and forth between zswap > > > > > > and LRUs indefinitely. > > > > > > > > > > > > You probably did not run into this as you're using zsmalloc, but it > > > > > > can happen with zbud AFAICT. Even with zsmalloc, a less problematic > > > > > > version can happen if zswap is above its acceptance threshold. > > > > > > > > > > > > This can cause thrashing and ineffective reclaim. We have an internal > > > > > > implementation where we mark incompressible pages and put them on the > > > > > > unevictable LRU when we don't have a backing swapfile (i.e. ghost > > > > > > swapfiles), and something similar may work if writeback is disabled. > > > > > > We need to scan such incompressible pages periodically though to > > > > > > remove them from the unevictable LRU if they have been dirited. > > > > > > > > > > I'm not sure this is an actual problem. > > > > > > > > > > When pages get rejected, they rotate to the furthest point from the > > > > > reclaimer - the head of the active list. We only get to them again > > > > > after we scanned everything else. > > > > > > > > > > If all that's left on the LRU is unzswappable, then you'd assume that > > > > > remainder isn't very large, and thus not a significant part of overall > > > > > scan work. Because if it is, then there is a serious problem with the > > > > > zswap configuration. > > > > > > > > > > There might be possible optimizations to determine how permanent a > > > > > rejection is, but I'm not sure the effort is called for just > > > > > yet. Rejections are already failure cases that screw up the LRU > > > > > ordering, and healthy setups shouldn't have a lot of those. I don't > > > > > think this patch adds any sort of new complications to this picture. > > > > > > > > We have workloads where a significant amount (maybe 20%? 30% not sure > > > > tbh) of the memory is incompressible. Zswap is still a very viable > > > > option for those workloads once those pages are taken out of the > > > > picture. If those pages remain on the LRUs, they will introduce a > > > > regression in reclaim efficiency. > > > > > > > > With the upstream code today, those pages go directly to the backing > > > > store, which isn't ideal in terms of LRU ordering, but this patch > > > > makes them stay on the LRUs, which can be harmful. I don't think we > > > > can just assume it is okay. Whether we make those pages unevictable or > > > > store them uncompressed in zswap, I think taking them out of the LRUs > > > > (until they are redirtied), is the right thing to do. > > > > > > This is how it works with zram as well, though, and it has plenty of > > > happy users. > > > > I am not sure I understand. Zram does not reject pages that do not > > compress well, right? IIUC it acts as a block device so it cannot > > reject pages. I feel like I am missing something. > > zram_write_page() can fail for various reasons - compression failure, > zsmalloc failure, the memory limit. This results in !!bio->bi_status, > __end_swap_bio_write redirtying the page, and vmscan rotating it. > > The effect is actually more pronounced with zram, because the pages > don't get activated and thus cycle faster. > > What you're raising doesn't seem to be a dealbreaker in practice. For the workloads using zram, yes, they are exclusively using zsmalloc which can store incompressible pages anyway. > > > If we already want to support taking pages away from the LRUs when > > rejected by zswap (e.g. Nhat's proposal earlier), doesn't it make > > sense to do that first so that this patch can be useful for all > > workloads? > > No. > > Why should users who can benefit now wait for a hypothetical future > optimization that isn't relevant to them? And by the looks of it, is > only relevant to a small set of specialized cases? > > And the optimization - should anybody actually care to write it - can > be transparently done on top later, so that's no reason to change > merge order, either. We can agree to disagree here, I am not trying to block this anyway. But let's at least document this in the commit message/docs/code (wherever it makes sense) -- that recurrent failures (e.g. incompressible memory) may keep going back to zswap only to get rejected, so workloads prone to this may observe some reclaim inefficiency.