Hi Ying, Thanks for taking a look! On Tue, Mar 26, 2024 at 01:47:45PM +0800, Huang, Ying wrote: > Johannes Weiner <hannes@xxxxxxxxxxx> writes: > > +static struct swap_cluster_info *setup_clusters(struct swap_info_struct *p, > > + unsigned char *swap_map) > > +{ > > + unsigned long nr_clusters = DIV_ROUND_UP(p->max, SWAPFILE_CLUSTER); > > + unsigned long col = p->cluster_next / SWAPFILE_CLUSTER % SWAP_CLUSTER_COLS; > > + struct swap_cluster_info *cluster_info; > > + unsigned long i, j, k, idx; > > + int cpu, err = -ENOMEM; > > + > > + cluster_info = kvcalloc(nr_clusters, sizeof(*cluster_info), GFP_KERNEL); > > if (!cluster_info) > > - return nr_extents; > > + goto err; > > + > > + for (i = 0; i < nr_clusters; i++) > > + spin_lock_init(&cluster_info[i].lock); > > > > + p->cluster_next_cpu = alloc_percpu(unsigned int); > > + if (!p->cluster_next_cpu) > > + goto err_free; > > + > > + /* Random start position to help with wear leveling */ > > + for_each_possible_cpu(cpu) > > + per_cpu(*p->cluster_next_cpu, cpu) = > > + get_random_u32_inclusive(1, p->highest_bit); > > + > > + p->percpu_cluster = alloc_percpu(struct percpu_cluster); > > + if (!p->percpu_cluster) > > + goto err_free; > > + > > + for_each_possible_cpu(cpu) { > > + struct percpu_cluster *cluster; > > + > > + cluster = per_cpu_ptr(p->percpu_cluster, cpu); > > + cluster_set_null(&cluster->index); > > + } > > + > > + /* > > + * Mark unusable pages as unavailable. The clusters aren't > > + * marked free yet, so no list operations are involved yet. > > + */ > > + for (i = 0; i < round_up(p->max, SWAPFILE_CLUSTER); i++) > > + if (i >= p->max || swap_map[i] == SWAP_MAP_BAD) > > + inc_cluster_info_page(p, cluster_info, i); > > If p->max is large, it seems better to use an loop like below? > > for (i = 0; i < swap_header->info.nr_badpages; i++) { > /* check i and inc_cluster_info_page() */ > } > > in most cases, swap_header->info.nr_badpages should be much smaller than > p->max. Yes, it's a little crappy. I've tried to not duplicate the smarts from setup_swap_map_and_extents() to avoid bugs if they go out of sync. Consulting the map directly is a bit more robust. Right now it's the badpages, but also the header at map[0], that needs to be marked. But you're right this could be slow with big files. I can send an update and add a comment to keep the functions in sync.