Re: [RFC PATCH] mm: swapfile: fix SSD detection with swapfile on btrfs

Johannes Weiner <hannes@xxxxxxxxxxx> · Tue, 26 Mar 2024 08:51:20 -0400

Hi Ying,

Thanks for taking a look!

On Tue, Mar 26, 2024 at 01:47:45PM +0800, Huang, Ying wrote:
> Johannes Weiner <hannes@xxxxxxxxxxx> writes:
> > +static struct swap_cluster_info *setup_clusters(struct swap_info_struct *p,
> > +						unsigned char *swap_map)
> > +{
> > +	unsigned long nr_clusters = DIV_ROUND_UP(p->max, SWAPFILE_CLUSTER);
> > +	unsigned long col = p->cluster_next / SWAPFILE_CLUSTER % SWAP_CLUSTER_COLS;
> > +	struct swap_cluster_info *cluster_info;
> > +	unsigned long i, j, k, idx;
> > +	int cpu, err = -ENOMEM;
> > +
> > +	cluster_info = kvcalloc(nr_clusters, sizeof(*cluster_info), GFP_KERNEL);
> >  	if (!cluster_info)
> > -		return nr_extents;
> > +		goto err;
> > +
> > +	for (i = 0; i < nr_clusters; i++)
> > +		spin_lock_init(&cluster_info[i].lock);
> >  
> > +	p->cluster_next_cpu = alloc_percpu(unsigned int);
> > +	if (!p->cluster_next_cpu)
> > +		goto err_free;
> > +
> > +	/* Random start position to help with wear leveling */
> > +	for_each_possible_cpu(cpu)
> > +		per_cpu(*p->cluster_next_cpu, cpu) =
> > +			get_random_u32_inclusive(1, p->highest_bit);
> > +
> > +	p->percpu_cluster = alloc_percpu(struct percpu_cluster);
> > +	if (!p->percpu_cluster)
> > +		goto err_free;
> > +
> > +	for_each_possible_cpu(cpu) {
> > +		struct percpu_cluster *cluster;
> > +
> > +		cluster = per_cpu_ptr(p->percpu_cluster, cpu);
> > +		cluster_set_null(&cluster->index);
> > +	}
> > +
> > +	/*
> > +	 * Mark unusable pages as unavailable. The clusters aren't
> > +	 * marked free yet, so no list operations are involved yet.
> > +	 */
> > +	for (i = 0; i < round_up(p->max, SWAPFILE_CLUSTER); i++)
> > +		if (i >= p->max || swap_map[i] == SWAP_MAP_BAD)
> > +			inc_cluster_info_page(p, cluster_info, i);
> 
> If p->max is large, it seems better to use an loop like below?
> 
>  	for (i = 0; i < swap_header->info.nr_badpages; i++) {
>                 /* check i and inc_cluster_info_page() */
>         }
> 
> in most cases, swap_header->info.nr_badpages should be much smaller than
> p->max.

Yes, it's a little crappy. I've tried to not duplicate the smarts from
setup_swap_map_and_extents() to avoid bugs if they go out of
sync. Consulting the map directly is a bit more robust. Right now it's
the badpages, but also the header at map[0], that needs to be marked.

But you're right this could be slow with big files. I can send an
update and add a comment to keep the functions in sync.