Chris Li <chrisl@xxxxxxxxxx> writes: > On Wed, Aug 7, 2024 at 6:14 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote: >> >> Chris Li <chrisl@xxxxxxxxxx> writes: >> [snip] >> > >> > /* >> > @@ -553,6 +568,19 @@ static bool scan_swap_map_try_ssd_cluster(struct swap_info_struct *si, >> > if (tmp == SWAP_NEXT_INVALID) { >> > if (!list_empty(&si->free_clusters)) { >> > ci = list_first_entry(&si->free_clusters, struct swap_cluster_info, list); >> > + list_del(&ci->list); >> > + spin_lock(&ci->lock); >> > + ci->order = order; >> > + ci->flags = 0; >> > + spin_unlock(&ci->lock); >> > + tmp = cluster_index(si, ci) * SWAPFILE_CLUSTER; >> > + } else if (!list_empty(&si->nonfull_clusters[order])) { >> > + ci = list_first_entry(&si->nonfull_clusters[order], >> > + struct swap_cluster_info, list); >> > + list_del(&ci->list); >> > + spin_lock(&ci->lock); >> > + ci->flags = 0; >> > + spin_unlock(&ci->lock); >> > tmp = cluster_index(si, ci) * SWAPFILE_CLUSTER; >> > } else if (!list_empty(&si->discard_clusters)) { >> >> We should check discard_clusters before nonfull clusters. > > And the reason behind that is? > > I see the discard_cluster can take a long time. It will take a > synchronous wait for the issuing the discard command. Why not just use > the nonfull list and return immediately. When the discard command > finished. It will show up in the free list anyway. I think that you are right. We don't need to wait for discard here. > BTW, what is your take on my previous analysis of the current SSD > prefer write new cluster can wear out the SSD faster? No. I don't agree with you on that. However, my knowledge on SSD wearing out algorithm is quite limited. > I think it might be useful to provide users an option to choose to > write a non full list first. The trade off is more friendly to SSD > wear out than preferring to write new blocks. If you keep doing the > swap long enough, there will be no new free cluster anyway. It depends on workloads. Some workloads may demonstrate better spatial locality. > The example I give in this email: > > https://lore.kernel.org/linux-mm/CACePvbXGBNC9WzzL4s2uB2UciOkV6nb4bKKkc5TBZP6QuHS_aQ@xxxxxxxxxxxxxx/ > > Chris >> >> > /* >> > @@ -967,6 +995,7 @@ static void swap_free_cluster(struct swap_info_struct *si, unsigned long idx) >> > ci = lock_cluster(si, offset); >> > memset(si->swap_map + offset, 0, SWAPFILE_CLUSTER); >> > ci->count = 0; >> > + ci->order = 0; >> > ci->flags = 0; >> > free_cluster(si, ci); >> > unlock_cluster(ci); >> > @@ -2922,6 +2951,9 @@ static int setup_swap_map_and_extents(struct swap_info_struct *p, >> > INIT_LIST_HEAD(&p->free_clusters); >> > INIT_LIST_HEAD(&p->discard_clusters); >> > >> > + for (i = 0; i < SWAP_NR_ORDERS; i++) >> > + INIT_LIST_HEAD(&p->nonfull_clusters[i]); >> > + >> > for (i = 0; i < swap_header->info.nr_badpages; i++) { >> > unsigned int page_nr = swap_header->info.badpages[i]; >> > if (page_nr == 0 || page_nr > swap_header->info.last_page) -- Best Regards, Huang, Ying