Re: [LSF/MM/BPF TOPIC] Swap Abstraction "the pony"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06.03.24 07:05, Barry Song wrote:
On Wed, Mar 6, 2024 at 4:00 PM Chris Li <chrisl@xxxxxxxxxx> wrote:

On Tue, Mar 5, 2024 at 5:15 PM Barry Song <21cnbao@xxxxxxxxx> wrote:
Another limitation I would like to address is that swap_writepage can
only write out IO in one contiguous chunk, not able to perform
non-continuous IO. When the swapfile is close to full, it is likely
the unused entry will spread across different locations. It would be
nice to be able to read and write large folio using discontiguous disk
IO locations.

I don't find it will be too difficult for swap_writepage to only write
out a large folio which has discontiguous swap offsets. taking
zRAM as an example, as long as bio can be organized correctly,
zram should be able to write a large folio one by one for its all
subpages.

Yes.


static void zram_bio_write(struct zram *zram, struct bio *bio)
{
         unsigned long start_time = bio_start_io_acct(bio);
         struct bvec_iter iter = bio->bi_iter;

         do {
                 u32 index = iter.bi_sector >> SECTORS_PER_PAGE_SHIFT;
                 u32 offset = (iter.bi_sector & (SECTORS_PER_PAGE - 1)) <<
                                 SECTOR_SHIFT;
                 struct bio_vec bv = bio_iter_iovec(bio, iter);

                 bv.bv_len = min_t(u32, bv.bv_len, PAGE_SIZE - offset);

                 if (zram_bvec_write(zram, &bv, index, offset, bio) < 0) {
                         atomic64_inc(&zram->stats.failed_writes);
                         bio->bi_status = BLK_STS_IOERR;
                         break;
                 }

                 zram_slot_lock(zram, index);
                 zram_accessed(zram, index);
                 zram_slot_unlock(zram, index);

                 bio_advance_iter_single(bio, &iter, bv.bv_len);
         } while (iter.bi_size);

         bio_end_io_acct(bio, start_time);
         bio_endio(bio);
}

right now , add_to_swap() is lacking a way to record discontiguous
offset for each subpage, alternatively, we have a folio->swap.

I wonder if we can somehow make it page granularity, for each
subpage, it can have its own offset somehow like page->swap,
then in swap_writepage(), we can make a bio with multiple
discontiguous I/O index. then we allow add_to_swap() to get
nr_pages different swap offsets, and fill into each subpage.

The key is where to store the subpage offset. It can't be stored on
the tail page's page->swap because some tail page's page struct are
just mapping of the head page's page struct. I am afraid this mapping
relationship has to be stored on the swap back end. That is the idea,
have swap backend keep track of an array of subpage's swap location.
This array is looked up by the head swap offset.

I assume "some tail page's page struct are just mapping of the head
page's page struct" is only true of hugeTLB larger than PMD-mapped
hugeTLB (for example 2MB) for this moment? more widely mTHP
less than PMD-mapped size will still have all tail page struct?

We just successfully stopped using subpages to store swap offsets, and even accidentally fixed a bug that was lurking for years. I am confident that we don't want to go back. The current direction is to move as much information we can out of the subpages: So if we can find ways to avoid messing with subpages, that would be great.

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux