Re: [PATCH v9 08/11] scsi: sd_zbc: emulate ZONE_APPEND commands

Johannes Thumshirn <Johannes.Thumshirn@xxxxxxx> · Tue, 28 Apr 2020 12:09:25 +0000

On 28/04/2020 13:42, Hannes Reinecke wrote:
[...]
>> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
>> index 50fff0bf8c8e..6009311105ef 100644
>> --- a/drivers/scsi/sd.h
>> +++ b/drivers/scsi/sd.h
>> @@ -79,6 +79,12 @@ struct scsi_disk {
>>    	u32		zones_optimal_open;
>>    	u32		zones_optimal_nonseq;
>>    	u32		zones_max_open;
>> +	u32		*zones_wp_ofst;
>> +	spinlock_t	zones_wp_ofst_lock;
>> +	u32		*rev_wp_ofst;
>> +	struct mutex	rev_mutex;
>> +	struct work_struct zone_wp_ofst_work;
>> +	char		*zone_wp_update_buf;
>>    #endif
>>    	atomic_t	openers;
>>    	sector_t	capacity;	/* size in logical blocks */
> 
> 'zones_wp_ofst' ?
> 
> Please replace the cryptic 'ofst' with 'offset'; those three additional
> characters don't really make a difference ...

'zones_wp_ofst' was good to maintain the 80 chars limit and not end up 
with broken up lines, because we crossed the limit. I'll have a look if 
we can make it 'zones_wp_offset' without uglifying the code.

[...]

>> @@ -396,11 +633,67 @@ static int sd_zbc_check_capacity(struct scsi_disk *sdkp, unsigned char *buf,
>>    	return 0;
>>    }
>>    
>> +static void sd_zbc_revalidate_zones_cb(struct gendisk *disk)
>> +{
>> +	struct scsi_disk *sdkp = scsi_disk(disk);
>> +
>> +	swap(sdkp->zones_wp_ofst, sdkp->rev_wp_ofst);
>> +}
>> +
>> +static int sd_zbc_revalidate_zones(struct scsi_disk *sdkp,
>> +				   u32 zone_blocks,
>> +				   unsigned int nr_zones)
>> +{
>> +	struct gendisk *disk = sdkp->disk;
>> +	int ret = 0;
>> +
>> +	/*
>> +	 * Make sure revalidate zones are serialized to ensure exclusive
>> +	 * updates of the scsi disk data.
>> +	 */
>> +	mutex_lock(&sdkp->rev_mutex);
>> +
>> +	/*
>> +	 * Revalidate the disk zones to update the device request queue zone
>> +	 * bitmaps and the zone write pointer offset array. Do this only once
>> +	 * the device capacity is set on the second revalidate execution for
>> +	 * disk scan or if something changed when executing a normal revalidate.
>> +	 */
>> +	if (sdkp->first_scan) {
>> +		sdkp->zone_blocks = zone_blocks;
>> +		sdkp->nr_zones = nr_zones;
>> +		goto unlock;
>> +	}
>> +
>> +	if (sdkp->zone_blocks == zone_blocks &&
>> +	    sdkp->nr_zones == nr_zones &&
>> +	    disk->queue->nr_zones == nr_zones)
>> +		goto unlock;
>> +
>> +	sdkp->rev_wp_ofst = kvcalloc(nr_zones, sizeof(u32), GFP_NOIO);
>> +	if (!sdkp->rev_wp_ofst) {
>> +		ret = -ENOMEM;
>> +		goto unlock;
>> +	}
>> +
>> +	ret = blk_revalidate_disk_zones(disk, sd_zbc_revalidate_zones_cb);
>> +
>> +	kvfree(sdkp->rev_wp_ofst);
>> +	sdkp->rev_wp_ofst = NULL;
>> +
>> +unlock:
>> +	mutex_unlock(&sdkp->rev_mutex);
> 
> I don't really understand this.
> Passing a callback is fine if things happen asynchronously, and you
> wouldn't know from the calling context when that happened. Ok.
> But the above code definitely assumes that blk_revalidate_disk_zones()
> will be completed upon return, otherwise we'll get a nice crash in the
> callback function as the 'rev' pointer is invalid.
> But _if_ blk_revalidata_disk_zones() has completed upon return we might
> as well kill the callback, have the ->rev_wp_ofst a local variable ans
> simply the whole thing.

Sorry but I don't understand your comment. If in 
blk_revalidate_disk_zones() returns an error, all that happens is that 
we free the rev_wp_ofst pointer and return the error to the caller.

And looking at blk_revalidate_disk_zones() itself, I can't see a code 
path that calls the callback if something went wrong:

noio_flag = memalloc_noio_save(); 

ret = disk->fops->report_zones(disk, 0, UINT_MAX, 

                                blk_revalidate_zone_cb, &args); 

memalloc_noio_restore(noio_flag); 

/*
  * Install the new bitmaps and update nr_zones only once the queue is 

  * stopped and all I/Os are completed (i.e. a scheduler is not 

  * referencing the bitmaps).
  */
blk_mq_freeze_queue(q); 

if (ret >= 0) { 

         blk_queue_chunk_sectors(q, args.zone_sectors); 

         q->nr_zones = args.nr_zones; 

         swap(q->seq_zones_wlock, args.seq_zones_wlock); 

         swap(q->conv_zones_bitmap, args.conv_zones_bitmap); 

         if (update_driver_data)
                 update_driver_data(disk); 

         ret = 0; 

} else {
         pr_warn("%s: failed to revalidate zones\n", disk->disk_name); 

         blk_queue_free_zone_bitmaps(q);
}
blk_mq_unfreeze_queue(q);

And even *iff* the callback would be executed, we would have:
static void sd_zbc_revalidate_zones_cb(struct gendisk *disk)
{
         struct scsi_disk *sdkp = scsi_disk(disk); 

         swap(sdkp->zones_wp_ofst, sdkp->rev_wp_ofst);
}

I.e. we exchange some pointers. I can't see a possible crash here, we're 
not accessing anything.

Byte,
	Johannes