On Mon, Nov 18, 2019 at 8:04 PM Jörn Engel <joern@xxxxxxxxxxxxxxx> wrote: > > On Sun, Nov 17, 2019 at 08:53:32PM +0200, vitaly.wool@xxxxxxxxxxxx wrote: > > From: Vitaly Wool <vitaly.wool@xxxxxxxxxxxx> > > > > The current zswap implementation uses red-black trees to store > > entries and to perform lookups. Although this algorithm obviously > > has complexity of O(log N) it still takes a while to complete > > lookup (or, even more for replacement) of an entry, when the amount > > of entries is huge (100K+). > > > > B-trees are known to handle such cases more efficiently (i. e. also > > with O(log N) complexity but with way lower coefficient) so trying > > zswap with B-trees was worth a shot. > > > > The implementation of B-trees that is currently present in Linux > > kernel isn't really doing things in the best possible way (i. e. it > > has recursion) but the testing I've run still shows a very > > significant performance increase. > > > > The usage pattern of B-tree here is not exactly following the > > guidelines but it is due to the fact that pgoff_t may be both 32 > > and 64 bits long. > > > > Tested on qemu-kvm (-smp 2 -m 1024) with zswap in the following > > configuration: > > * zpool: z3fold > > * max_pool_percent: 100 > > and the swap size of 1G. > > > > Test command: > > $ stress-ng --io 4 --vm 4 --vm-bytes 1000M --timeout 300s --metrics > > > > This, averaged over 20 runs on qemu-kvm (-smp 2 -m 1024) gives the > > following io bogo ops: > > * original: 73778.8 > > * btree: 393999 > > Impressive results. Was your test done with a 32bit guest? If yes, I > would assume results for a 64bit guess to drop to about 330k. No, it's on a 64 bit virtual machine. I take this improvement is partially due to zswap_insert_or_replace function which requires less lookups than the initial implementation, but it's the btree API that made it possible. > > + if (sizeof(pgoff_t) == 8) > > + btree_pgofft_geo = &btree_geo64; > > + else > > + btree_pgofft_geo = &btree_geo32; > > + > > You could abuse the fact that pgoff_t is the same size as unsigned long > and use the "l" suffix variant. But apart from the obvious abuse, the > "l" variant hasn't been used before and the implementation appears to be > buggy. > > So no complaints about your use of the interface. Thanks! I would then keep it as is and have a task for myself to try out and possibly debug the "l" suffix variant later on. ~Vitaly