On Wed, Jan 3, 2024 at 6:12 AM Zhongkun He <hezhongkun.hzk@xxxxxxxxxxxxx> wrote: > > > That's around 2.7% increase in real time, no? Admittedly, this > > micro-benchmark is too small to conclude either way, but the data > > doesn't seem to be in your favor. > > > > I'm a bit concerned about the overhead here, given that with this > > patch we will drain the per-cpu batch on every written-back entry. > > That's quite a high frequency, especially since we're moving towards > > more writeback (either with the new zswap shrinker, or your time > > threshold-based writeback mechanism). For instance, there seems to be > > some (local/per-cpu) locking involved, no? Could there be some form of > > lock contentions there (especially since with the new shrinker, you > > can have a lot of concurrent writeback contexts?) > > > > Furthermore, note that a writeback from zswap to swap saves less > > memory than a writeback from memory to swap, so the effect of the > > extra overhead will be even more pronounced. That is, the amount of > > extra work done (from this change) to save one unit of memory would be > > even larger than if we call lru_add_drain() every time we swap out a > > page (from memory -> swap). And I'm pretty sure we don't call > > lru_add_drain() every time we swap out a page - I believe we call > > lru_add_drain() every time we perform a shrink action. For e.g, in > > shrink_inactive_list(). That's much coarser in granularity than here. > > > > Also, IIUC, the more often we perform lru_add_drain(), the less > > batching effect we will obtain. IOW, the overhead of maintaining the > > batch will become higher relative to the performance gains from > > batching. > > > > Maybe I'm missing something - could you walk me through how > > lru_add_drain() is fine here, from this POV? Thanks! > > > > > > > > After writeback, we perform the following steps to release the memory again > > > echo 1g > memory.reclaim > > > > > > Base: > > > total used recalim total used > > > Mem: 38Gi 2.5Gi ----> 38Gi 1.5Gi > > > Swap: 5.0Gi 1.0Gi ----> 5Gi 1.5Gi > > > used memory -1G swap +0.5g > > > It means that half of the pages are failed to move to the tail of lru list, > > > So we need to release an additional 0.5Gi anon pages to swap space. > > > > > > With this patch: > > > total used recalim total used > > > Mem: 38Gi 2.6Gi ----> 38Gi 1.6Gi > > > Swap: 5.0Gi 1.0Gi ----> 5Gi 1Gi > > > > > > used memory -1Gi, swap +0Gi > > > It means that we release all the pages which have been add to the tail of > > > lru list in zswap_writeback_entry() and folio_rotate_reclaimable(). > > > > > > > OTOH, this suggests that we're onto something. Swap usage seems to > > decrease quite a bit. Sounds like a real problem that this patch is > > tackling. > > (Please add this benchmark result to future changelog. It'll help > > demonstrate the problem). > > Yes > > > > > I'm inclined to ack this patch, but it'd be nice if you can assuage my > > concerns above (with some justification and/or larger benchmark). > > > > OK,thanks. > > > (Or perhaps, we have to drain, but less frequently/higher up the stack?) > > > > I've reviewed the code again and have no idea. It would be better if > you have any suggestions. Hmm originally I was thinking of doing an (unconditional) lru_add_drain() outside of zswap_writeback_entry() - once in shrink_worker() and/or zswap_shrinker_scan(), before we write back any of the entries. Not sure if it would work/help here tho - haven't tested that idea yet. > > New test: > This patch will add the execution of folio_rotate_reclaimable(not executed > without this patch) and lru_add_drain,including percpu lock competition. > I bind a new task to allocate memory and use the same batch lock to compete > with the target process, on the same CPU. > context: > 1:stress --vm 1 --vm-bytes 1g (bind to cpu0) > 2:stress --vm 1 --vm-bytes 5g --vm-hang 0(bind to cpu0) > 3:reclaim pages, and writeback 5G zswap_entry in cpu0 and node 0. > > Average time of five tests > > Base patch patch + compete > 4.947 5.0676 5.1336 > +2.4% +3.7% > compete means: a new stress run in cpu0 to compete with the writeback process. > PID USER %CPU %MEM TIME+ COMMAND P > 1367 root 49.5 0.0 1:09.17 bash (writeback) 0 > 1737 root 49.5 2.2 0:27.46 stress (use percpu > lock) 0 > > around 2.4% increase in real time,including the execution of > folio_rotate_reclaimable(not executed without this patch) and lru_add_drain,but > no lock contentions. Hmm looks like the regression is still there, no? > > around 1.3% additional increase in real time with lock contentions on the same > cpu. > > There is another option here, which is not to move the page to the > tail of the inactive > list after end_writeback and delete the following code in > zswap_writeback_entry(), > which did not work properly. But the pages will not be released first. > > /* move it to the tail of the inactive list after end_writeback */ > SetPageReclaim(page); Or only SetPageReclaim on pages on LRU? > > Thanks, > Zhongkun > > > Thanks, > > Nhat > > > > > > > > Thanks for your time Nhat and Andrew. Happy New Year!