在 2021/1/15 6:41, Vitaly Wool 写道:
On Thu, Jan 14, 2021 at 10:09 PM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote:
On Thu, Jan 14, 2021 at 11:53 AM Vitaly Wool <vitaly.wool@xxxxxxxxxxxx> wrote:
On Thu, Jan 14, 2021 at 8:21 PM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote:
On Thu, Jan 14, 2021 at 11:05 AM Vitaly Wool <vitaly.wool@xxxxxxxxxxxx> wrote:
On Thu, Jan 14, 2021 at 7:56 PM Minchan Kim <minchan@xxxxxxxxxx> wrote:
On Thu, Jan 14, 2021 at 07:40:50PM +0100, Vitaly Wool wrote:
On Thu, Jan 14, 2021 at 7:29 PM Minchan Kim <minchan@xxxxxxxxxx> wrote:
On Fri, Dec 25, 2020 at 07:02:50PM +0800, Tian Tao wrote:
add a flag to zpool, named is "can_sleep_mapped", and have it set true
for zbud/z3fold, set false for zsmalloc. Then zswap could go the current
path if the flag is true; and if it's false, copy data from src to a
temporary buffer, then unmap the handle, take the mutex, process the
buffer instead of src to avoid sleeping function called from atomic
context.
Signed-off-by: Tian Tao <tiantao6@xxxxxxxxxxxxx>
---
include/linux/zpool.h | 3 +++
mm/zpool.c | 13 +++++++++++++
mm/zswap.c | 50 +++++++++++++++++++++++++++++++++++++++++++++-----
3 files changed, 61 insertions(+), 5 deletions(-)
diff --git a/include/linux/zpool.h b/include/linux/zpool.h
index 51bf430..e899701 100644
--- a/include/linux/zpool.h
+++ b/include/linux/zpool.h
@@ -73,6 +73,7 @@ u64 zpool_get_total_size(struct zpool *pool);
* @malloc: allocate mem from a pool.
* @free: free mem from a pool.
* @shrink: shrink the pool.
+ * @sleep_mapped: whether zpool driver can sleep during map.
I don't think it's a good idea. It just breaks zpool abstraction
in that it exposes internal implementation to user to avoid issue
zswap recently introduced. It also conflicts zpool_map_handle's
semantic.
Rather than introducing another break in zpool due to the new
zswap feature recenlty introduced, zswap could introduce
CONFIG_ZSWAP_HW_COMPRESSOR. Once it's configured, zsmalloc could
be disabled. And with disabling CONFIG_ZSWAP_HW_COMPRESSOR, zswap
doesn't need to make any bounce buffer copy so that no existing
zsmalloc user will see performance regression.
I believe it won't help that much -- the new compressor API presumes
that the caller may sleep during compression and that will be an
accident waiting to happen as long as we use it and keep the handle
mapped in zsmalloc case.
Or maybe I interpreted you wrong and you are suggesting re-introducing
calls to the old API under this #ifdef, is that the case?
Yub. zswap could abstract that part under #ifdef to keep old behavior.
We can reconsider this option when zsmalloc implements reclaim
callback. So far it's obviously too much a mess for a reason so weak.
Sorry I don't understand the link between zsmalloc implementing shrink
callback and this patch.
There is none. There is a link between taking all the burden to revive
zsmalloc for zswap at the cost of extra zswap complexity and zsmalloc
not being fully zswap compatible.
The ultimate zswap goal is to cache hottest swapped-out pages in a
compressed format. zsmalloc doesn't implement reclaim callback, and
therefore zswap can *not* fulfill its goal since old pages are there
to stay, and it can't accept new hotter ones. So, let me make it
clear: zswap/zsmalloc combo is a legacy corner case.
This is the first time I am hearing that zswap/zsmalloc combo is a
legacy configuration. We use zswap in production and intentionally
size the zswap pool to never have to go to the backing device. So
absence of reclaim callback is not an issue for us. Please let me know
if the zswap/zsmalloc combo is officially on its way to deprecation.
No, zswap/zsmalloc combo not on the way to deprecation. I generally
would not advise on using it but your particular case does make sense
(although using frontswap/zswap without a backing device *is* a corner
case).
This patch is adding an overhead for all
zswap+zsmalloc users irrespective of availability of hardware. If we
want to add support for new hardware, please add without impacting the
current users.
No, it's not like that. zswap+zsmalloc combination is currently
already broken
By broken do you mean the missing reclaim callback?
No. I mean deadlocks in -rt kernel / scheduling while atomic bugs in
the mainline. Missing reclaim callback goes into "not fully
compatible" section in my world.
and this patch implements a way to work that around.
The users are already impacted and that is of course a problem.
Are you talking about rt users or was there some other report?
I don't think the issue is specific to -rt series. There may (and
will) still be "scheduling while atomic" bugs occurring, just not as
often.
The workaround is not perfect but looks good enough for me.
I would like a see page fault perf experiment for the non-hardware case.
I second that. @tiantao (H), would it be possible to provide one?
No problem, but can you tell how to provide the data you want for page
fault?
Also, correct me if I'm wrong but from what I recall, the acomp API
does one redundant copy less than the old comp API so zsmalloc should
be back to square one even with the buffer implemented in this patch.
The other backends should do a little better though but if so, it's
the upside of not taking too many spinlocks.
Best regards,
Vitaly