On 2025/2/20 17:07, Baolin Wang wrote:
On 2025/2/20 00:10, Zi Yan wrote:
On 19 Feb 2025, at 5:04, Baolin Wang wrote:
Hi Zi,
Sorry for the late reply due to being busy with other things:)
Thank you for taking a look at the patches. :)
On 2025/2/19 07:54, Zi Yan wrote:
During shmem_split_large_entry(), large swap entries are covering n
slots
and an order-0 folio needs to be inserted.
Instead of splitting all n slots, only the 1 slot covered by the folio
need to be split and the remaining n-1 shadow entries can be
retained with
orders ranging from 0 to n-1. This method only requires
(n/XA_CHUNK_SHIFT) new xa_nodes instead of (n % XA_CHUNK_SHIFT) *
(n/XA_CHUNK_SHIFT) new xa_nodes, compared to the original
xas_split_alloc() + xas_split() one.
For example, to split an order-9 large swap entry (assuming
XA_CHUNK_SHIFT
is 6), 1 xa_node is needed instead of 8.
xas_try_split_min_order() is used to reduce the number of calls to
xas_try_split() during split.
For shmem swapin, if we cannot swap in the whole large folio by
skipping the swap cache, we will split the large swap entry stored in
the shmem mapping into order-0 swap entries, rather than splitting it
into other orders of swap entries. This is because the next time we
swap in a shmem folio through shmem_swapin_cluster(), it will still
be an order 0 folio.
Right. But the swapin is one folio at a time, right?
shmem_split_large_entry()
Yes, now we always swapin an order-0 folio from the async swap device at
a time. However, for sync swap device, we will skip the swapcache and
swapin the whole large folio by commit 1dd44c0af4fa, so it will not call
shmem_split_large_entry() in this case.
should split the large swap entry and give you a slot to store the
order-0 folio.
For example, with an order-9 large swap entry, to swap in first
order-0 folio,
the large swap entry will become order-0, order-0, order-1, order-2,…
order-8,
after the split. Then the first order-0 swap entry can be used.
Then, when a second order-0 is swapped in, the second order-0 can be
used.
When the last order-0 is swapped in, the order-8 would be split to
order-7,order-6,…,order-1,order-0, order-0, and the last order-0 will
be used.
Yes, understood. However, for the sequential swapin scenarios, where
originally only one split operation is needed. However, your approach
increases the number of split operations. Of course, I understand that
in non-sequential swapin scenarios, your patch will save some xarray
memory. It might be necessary to evaluate whether the increased split
operations will have a significant impact on the performance of
sequential swapin?
Maybe the swapin assumes after shmem_split_large_entry(), all swap
entries
are order-0, which can lead to issues. There should be some check like
if the swap entry order > folio_order, shmem_split_large_entry() should
be used.
Moreover I did a quick test with swapping in order 6 shmem folios,
however, my test hung, and the console was continuously filled with
the following information. It seems there are some issues with shmem
swapin handling. Anyway, I need more time to debug and test.
To swap in order-6 folios, shmem_split_large_entry() does not allocate
any new xa_node, since XA_CHUNK_SHIFT is 6. It is weird to see OOM
error below. Let me know if there is anything I can help.
I encountered some issues while testing order 4 and order 6 swapin with
your patches. And I roughly reviewed the patch, and it seems that the
new swap entry stored in the shmem mapping was not correctly updated
after the split.
The following logic is to reset the swap entry after split, and I assume
that the large swap entry is always split to order 0 before. As your
patch suggests, if a non-uniform split is used, then the logic for
resetting the swap entry needs to be changed? Please correct me if I
missed something.
/*
* Re-set the swap entry after splitting, and the swap
* offset of the original large entry must be continuous.
*/
for (i = 0; i < 1 << order; i++) {
pgoff_t aligned_index = round_down(index, 1 << order);
swp_entry_t tmp;
tmp = swp_entry(swp_type(swap), swp_offset(swap) + i);
__xa_store(&mapping->i_pages, aligned_index + i,
swp_to_radix_entry(tmp), 0);
}
In addition, after your patch, the shmem_split_large_entry() seems
always return 0 even though it splits a large swap entry, but we still
need re-calculate the swap entry value after splitting, otherwise it may
return errors due to shmem_confirm_swap() validation failure.
/*
* If the large swap entry has already been split, it is
* necessary to recalculate the new swap entry based on
* the old order alignment.
*/
if (split_order > 0) {
pgoff_t offset = index - round_down(index, 1 << split_order);
swap = swp_entry(swp_type(swap), swp_offset(swap) + offset);
}