On (25/01/28 01:36), Yosry Ahmed wrote: > > Yes, for (a) mentioned above. > > > > > I guess in the WO case the buffer is not needed and we can just pass > > > NULL? > > > > Yes. > > Perhaps we want to document this and enforce it (make sure that the > NULL-ness of the buffer matches the access type). Right. > > But, and it's a big but. And it's (b) from the above. I wasn't brave > > enough to just drop (b) optimization and replace it with memcpy(), > > especially when we work with relatively large objects (say size-class > > 3600 bytes and above). This certainly would not make battery powered > > devices happier. Maybe in zswap the page is only read once (is that > > correct?), but in zram page can be read multiple times (e.g. when zram > > is used as a raw block-dev, or has a mounted fs on it) which means > > multiple extra memcpy()-s. > > In zswap, because we use the crypto_acomp API, when we cannot sleep with > the object mapped (which is true for zsmalloc), we just copy the > compressed object into a preallocated buffer anyway. So having a > zs_obj_load() interface would move that copy inside zsmalloc. Yeah, I saw zpool_can_sleep_mapped() and had the same thought. zram, as of now, doesn't support algos that can/need schedule internally for whatever reason - kmalloc, mutex, H/W wait, etc. > With your series, zswap can drop the memcpy and save some cycles on the > compress side. I didn't realize that zram does not perform any copies on the > read/decompress side. > > Maybe the load interface can still provide a buffer to avoid the copy > where possible? I suspect with that we don't need the state and can > just pass a pointer. We'd need another call to potentially unmap, so > maybe load_start/load_end, or read_start/read_end. > > Something like: > > zs_obj_read_start(.., buf) > { > if (contained in one page) > return kmapped obj > else > memcpy to buf > return buf > } > > zs_obj_read_end(.., buf) > { > if (container in one page) > kunmap > } > > The interface is more straightforward and we can drop the map flags > entirely, unless I missed something here. Unfortunately you'd still need > the locking changes in zsmalloc to make zram reads fully preemptible. Agreed, the interface part is less of a problem, the atomicity of zsmalloc is a much bigger issue. We, technically, only need to mark zspage as "being used, don't free" so that zsfree/compaction/migration don't mess with it, but this is only "technically". In practice we then have CPU0 CPU1 zs_map_object set READ bit migrate schedule pool rwlock size class spin-lock wait for READ bit to clear ... set WRITE bit clear READ bit and the whole thing collapses like a house of cards. I wasn't able to trigger a watchdog on my tests, but the pattern is there and it's enough. Maybe we can teach compaction and migration to try-WRITE and bail out if the page is locked, but I don't know. > I am not suggesting that we have to go this way, just throwing out > ideas. Sure, load+store is still an option. While that zs_map_object() optimization is nice, it may have two sides [in zram case]. On one hand, we safe memcpy() [but only for certain objects], on the other hand, we keep the page locked for the entire decompression duration, which can be quite a while (e.g. when algorithm is configured with a very high compression level): CPU0 CPU1 zs_map_object read lock page rwlock write lock page rwlock spin decompress() ... spin a lot read unlock page rwlock Maybe copy-in is just an okay thing to do. Let me try to measure. > BTW, are we positive that the locking changes made in this series are > not introducing regressions? Cannot claim that with confidence. Our workloads don't match, we don't even use zsmalloc in the same way :) Here be dragons.