On 2020-04-28 8:09 a.m., Johannes Thumshirn wrote:
On 28/04/2020 13:42, Hannes Reinecke wrote:
[...]
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 50fff0bf8c8e..6009311105ef 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -79,6 +79,12 @@ struct scsi_disk {
u32 zones_optimal_open;
u32 zones_optimal_nonseq;
u32 zones_max_open;
+ u32 *zones_wp_ofst;
+ spinlock_t zones_wp_ofst_lock;
+ u32 *rev_wp_ofst;
+ struct mutex rev_mutex;
+ struct work_struct zone_wp_ofst_work;
+ char *zone_wp_update_buf;
#endif
atomic_t openers;
sector_t capacity; /* size in logical blocks */
'zones_wp_ofst' ?
Please replace the cryptic 'ofst' with 'offset'; those three additional
characters don't really make a difference ...
'zones_wp_ofst' was good to maintain the 80 chars limit and not end up
with broken up lines, because we crossed the limit. I'll have a look if
we can make it 'zones_wp_offset' without uglifying the code.
When stuck like that, I have started using "_o" as in "lba_o" for
LBA's octet offset :-)
Good rule of thumb: if you can't write "a = b + c" (a couple of tab
indentations in) then your variable names are too damn long.
Of course the Linux kernel dictating indent_to_8 and restricting lines
to punched card width doesn't help.
Doug Gilbert
[...]
@@ -396,11 +633,67 @@ static int sd_zbc_check_capacity(struct scsi_disk *sdkp, unsigned char *buf,
return 0;
}
+static void sd_zbc_revalidate_zones_cb(struct gendisk *disk)
+{
+ struct scsi_disk *sdkp = scsi_disk(disk);
+
+ swap(sdkp->zones_wp_ofst, sdkp->rev_wp_ofst);
+}
+
+static int sd_zbc_revalidate_zones(struct scsi_disk *sdkp,
+ u32 zone_blocks,
+ unsigned int nr_zones)
+{
+ struct gendisk *disk = sdkp->disk;
+ int ret = 0;
+
+ /*
+ * Make sure revalidate zones are serialized to ensure exclusive
+ * updates of the scsi disk data.
+ */
+ mutex_lock(&sdkp->rev_mutex);
+
+ /*
+ * Revalidate the disk zones to update the device request queue zone
+ * bitmaps and the zone write pointer offset array. Do this only once
+ * the device capacity is set on the second revalidate execution for
+ * disk scan or if something changed when executing a normal revalidate.
+ */
+ if (sdkp->first_scan) {
+ sdkp->zone_blocks = zone_blocks;
+ sdkp->nr_zones = nr_zones;
+ goto unlock;
+ }
+
+ if (sdkp->zone_blocks == zone_blocks &&
+ sdkp->nr_zones == nr_zones &&
+ disk->queue->nr_zones == nr_zones)
+ goto unlock;
+
+ sdkp->rev_wp_ofst = kvcalloc(nr_zones, sizeof(u32), GFP_NOIO);
+ if (!sdkp->rev_wp_ofst) {
+ ret = -ENOMEM;
+ goto unlock;
+ }
+
+ ret = blk_revalidate_disk_zones(disk, sd_zbc_revalidate_zones_cb);
+
+ kvfree(sdkp->rev_wp_ofst);
+ sdkp->rev_wp_ofst = NULL;
+
+unlock:
+ mutex_unlock(&sdkp->rev_mutex);
I don't really understand this.
Passing a callback is fine if things happen asynchronously, and you
wouldn't know from the calling context when that happened. Ok.
But the above code definitely assumes that blk_revalidate_disk_zones()
will be completed upon return, otherwise we'll get a nice crash in the
callback function as the 'rev' pointer is invalid.
But _if_ blk_revalidata_disk_zones() has completed upon return we might
as well kill the callback, have the ->rev_wp_ofst a local variable ans
simply the whole thing.
Sorry but I don't understand your comment. If in
blk_revalidate_disk_zones() returns an error, all that happens is that
we free the rev_wp_ofst pointer and return the error to the caller.
And looking at blk_revalidate_disk_zones() itself, I can't see a code
path that calls the callback if something went wrong:
noio_flag = memalloc_noio_save();
ret = disk->fops->report_zones(disk, 0, UINT_MAX,
blk_revalidate_zone_cb, &args);
memalloc_noio_restore(noio_flag);
/*
* Install the new bitmaps and update nr_zones only once the queue is
* stopped and all I/Os are completed (i.e. a scheduler is not
* referencing the bitmaps).
*/
blk_mq_freeze_queue(q);
if (ret >= 0) {
blk_queue_chunk_sectors(q, args.zone_sectors);
q->nr_zones = args.nr_zones;
swap(q->seq_zones_wlock, args.seq_zones_wlock);
swap(q->conv_zones_bitmap, args.conv_zones_bitmap);
if (update_driver_data)
update_driver_data(disk);
ret = 0;
} else {
pr_warn("%s: failed to revalidate zones\n", disk->disk_name);
blk_queue_free_zone_bitmaps(q);
}
blk_mq_unfreeze_queue(q);
And even *iff* the callback would be executed, we would have:
static void sd_zbc_revalidate_zones_cb(struct gendisk *disk)
{
struct scsi_disk *sdkp = scsi_disk(disk);
swap(sdkp->zones_wp_ofst, sdkp->rev_wp_ofst);
}
I.e. we exchange some pointers. I can't see a possible crash here, we're
not accessing anything.
Byte,
Johannes