On Wed, Nov 27, 2024 at 5:52 PM Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx> wrote: > > On (24/11/27 09:20), Barry Song wrote: > [..] > > > 390 12736 > > > 395 13056 > > > 404 13632 > > > 410 14016 > > > 415 14336 > > > 418 14528 > > > 447 16384 > > > > > > E.g. 13632 and 13056 are more than 500 bytes apart. > > > > > > > swap-out time(ms) 68711 49908 > > > > swap-in time(ms) 30687 20685 > > > > compression ratio 20.49% 16.9% > > > > > > These are not the only numbers to focus on, really important metrics > > > are: zsmalloc pages-used and zsmalloc max-pages-used. Then we can > > > calculate the pool memory usage ratio (the size of compressed data vs > > > the number of pages zsmalloc pool allocated to keep them). > > > > To address this, we plan to collect more data and get back to you > > afterwards. From my understanding, we still have an opportunity > > to refine the CHAIN SIZE? > > Do you mean changing the value? It's configurable. > > > Essentially, each small object might cause some waste within the > > original PAGE_SIZE. Now, with 4 * PAGE_SIZE, there could be a > > single instance of waste. If we can manage the ratio, this could be > > optimized? > > All size classes work the same and we merge size-classes with equal > characteristics. So in the example above > > 395 13056 > 404 13632 > > size-classes #396-403 are merged with size-class #404. And #404 size-class > splits zspage into 13632-byte chunks, any smaller objects (e.g. an object > from size-class #396 (which can be just one byte larger than #395 > objects)) takes that entire chunk and the rest of the space in the chunk > is just padding. > > CHAIN_SIZE is how we find the optimal balance. The larger the zspage > the more likely we squeeze some space for extra objects, which otherwise > would have been just a waste. With large CHAIN_SIZE we also change > characteristics of many size classes so we merge less classes and have > more clusters. The price, on the other hand, is more physical 0-order > pages per zspage, which can be painful. On all the tests I ran 8 or 10 > worked best. Thanks very much for the explanation. We’ll gather more data on this and follow up with you. > > [..] > > > another option might be to just use a faster algorithm and then utilize > > > post-processing (re-compression with zstd or writeback) for memory > > > savings? > > > > The concern lies in power consumption > > But the power consumption concern is also in "decompress just one middle > page from very large object" case, and size-classes de-fragmentation That's why we have "[patch 4/4] mm: fall back to four small folios if mTHP allocation fails" to address the issue of "decompressing just one middle page from a very large object." I assume that recompression and writeback should also focus on large objects if the original compression involves multiple pages? > which requires moving around lots of objects in order to form more full > zspage and release empty zspages. There are concerns everywhere, how I assume the cost of defragmentation is M * N, where: * M is the number of objects, * N is the size of the objects. With large objects, M is reduced to 1/4 of the original number of objects. Although N increases, the overall M * N becomes slightly smaller than before, as N is just under 4 times the size of the original objects? > many of them are measured and analyzed and either ruled out or confirmed > is another question. In phone scenarios, if recompression uses zstd and the original compression is based on lz4 with 4KB blocks, the cost to obtain zstd-compressed objects would be: * A: Compression of 4 × 4KB using lz4 * B: Decompression of 4 × 4KB using lz4 * C: Compression of 4 × 4KB using zstd By leveraging the speed advantages of mTHP swap and zstd's large-block compression, the cost becomes: D: Compression of 16KB using zstd Since D is significantly smaller than C (D < C), it follows that: D < A + B + C ? Thanks Barry