On 2021/08/10 17:24, Christoph Hellwig wrote: > On Mon, Jul 26, 2021 at 10:38:03AM +0900, Damien Le Moal wrote: >> The Concurrent Positioning Ranges VPD page (for SCSI) and Log (for ATA) >> contain parameters describing the number of sets of contiguous LBAs that >> can be served independently by a single LUN multi-actuator disk. This >> patch provides the blk_queue_set_cranges() function allowing a device >> driver to signal to the block layer that a disk has multiple actuators, >> each one serving a contiguous range of sectors. To describe the set >> of sector ranges representing the different actuators of a device, the >> data type struct blk_cranges is introduced. >> >> For a device with multiple actuators, a struct blk_cranges is attached >> to the device request queue by the blk_queue_set_cranges() function. The >> function blk_alloc_cranges() is provided for drivers to allocate this >> structure. >> >> The blk_cranges structure contains kobjects (struct kobject) to register >> with sysfs the set of sector ranges defined by a device. On initial >> device scan, this registration is done from blk_register_queue() using >> the block layer internal function blk_register_cranges(). If a driver >> calls blk_queue_set_cranges() for a registered queue, e.g. when a device >> is revalidated, blk_queue_set_cranges() will execute >> blk_register_cranges() to update the queue sysfs attribute files. >> >> The sysfs file structure created starts from the cranges sub-directory >> and contains the start sector and number of sectors served by an >> actuator, with the information for each actuator grouped in one >> directory per actuator. E.g. for a dual actuator drive, we have: >> >> $ tree /sys/block/sdk/queue/cranges/ >> /sys/block/sdk/queue/cranges/ >> |-- 0 >> | |-- nr_sectors >> | `-- sector >> `-- 1 >> |-- nr_sectors >> `-- sector >> >> For a regular single actuator device, the cranges directory does not >> exist. >> >> Device revalidation may lead to changes to this structure and to the >> attribute values. When manipulated, the queue sysfs_lock and >> sysfs_dir_lock are held for atomicity, similarly to how the blk-mq and >> elevator sysfs queue sub-directories are protected. >> >> The code related to the management of cranges is added in the new >> file block/blk-cranges.c. >> >> Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx> >> --- >> block/Makefile | 2 +- >> block/blk-cranges.c | 295 +++++++++++++++++++++++++++++++++++++++++ >> block/blk-sysfs.c | 13 ++ >> block/blk.h | 3 + >> include/linux/blkdev.h | 29 ++++ >> 5 files changed, 341 insertions(+), 1 deletion(-) >> create mode 100644 block/blk-cranges.c >> >> diff --git a/block/Makefile b/block/Makefile >> index bfbe4e13ca1e..e477e6ca9ea6 100644 >> --- a/block/Makefile >> +++ b/block/Makefile >> @@ -9,7 +9,7 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-sysfs.o \ >> blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \ >> blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \ >> genhd.o ioprio.o badblocks.o partitions/ blk-rq-qos.o \ >> - disk-events.o >> + disk-events.o blk-cranges.o >> >> obj-$(CONFIG_BOUNCE) += bounce.o >> obj-$(CONFIG_BLK_SCSI_REQUEST) += scsi_ioctl.o >> diff --git a/block/blk-cranges.c b/block/blk-cranges.c >> new file mode 100644 >> index 000000000000..deaa09e564f7 >> --- /dev/null >> +++ b/block/blk-cranges.c >> @@ -0,0 +1,295 @@ >> +// SPDX-License-Identifier: GPL-2.0 >> +/* >> + * Block device concurrent positioning ranges. >> + * >> + * Copyright (C) 2021 Western Digital Corporation or its Affiliates. >> + */ >> +#include <linux/kernel.h> >> +#include <linux/blkdev.h> >> +#include <linux/slab.h> >> +#include <linux/init.h> >> + >> +#include "blk.h" >> + >> +static ssize_t blk_crange_sector_show(struct blk_crange *cr, char *page) >> +{ >> + return sprintf(page, "%llu\n", cr->sector); >> +} >> + >> +static ssize_t blk_crange_nr_sectors_show(struct blk_crange *cr, char *page) >> +{ >> + return sprintf(page, "%llu\n", cr->nr_sectors); >> +} >> + >> +struct blk_crange_sysfs_entry { >> + struct attribute attr; >> + ssize_t (*show)(struct blk_crange *cr, char *page); >> +}; >> + >> +static struct blk_crange_sysfs_entry blk_crange_sector_entry = { >> + .attr = { .name = "sector", .mode = 0444 }, >> + .show = blk_crange_sector_show, >> +}; >> + >> +static struct blk_crange_sysfs_entry blk_crange_nr_sectors_entry = { >> + .attr = { .name = "nr_sectors", .mode = 0444 }, >> + .show = blk_crange_nr_sectors_show, >> +}; >> + >> +static struct attribute *blk_crange_attrs[] = { >> + &blk_crange_sector_entry.attr, >> + &blk_crange_nr_sectors_entry.attr, >> + NULL, >> +}; >> +ATTRIBUTE_GROUPS(blk_crange); >> + >> +static ssize_t blk_crange_sysfs_show(struct kobject *kobj, >> + struct attribute *attr, char *page) >> +{ >> + struct blk_crange_sysfs_entry *entry; >> + struct blk_crange *cr; >> + struct request_queue *q; >> + ssize_t ret; >> + >> + entry = container_of(attr, struct blk_crange_sysfs_entry, attr); >> + cr = container_of(kobj, struct blk_crange, kobj); >> + q = cr->queue; > > Nit: I'd find this a little eaier to read if the variables were > initialized at declaration time: > > struct blk_crange_sysfs_entry *entry = > container_of(attr, struct blk_crange_sysfs_entry, attr); > struct blk_crange *cr = container_of(kobj, struct blk_crange, kobj); > struct request_queue *q = cr->queue; > > (or maybe even don't bother with the q local variable). OK. > >> +/* >> + * Dummy release function to make kobj happy. >> + */ >> +static void blk_cranges_sysfs_nop_release(struct kobject *kobj) >> +{ >> +} > > How do we ensure the kobj is still alive while it is accessed? q->sysfs_lock ensures that. This mutex is taken whenever revalidate registers new ranges (see blk_queue_set_cranges below), and is taken also when the ranges are unregistered (on revalidate if the ranges changed and when the request queue is unregistered). And blk_crange_sysfs_show() takes that lock too. So the kobj cannot be freed while it is being accessed (the sysfs inode lock also prevents it since kobj_del() will take the inode lock). > >> +void blk_queue_set_cranges(struct gendisk *disk, struct blk_cranges *cr) > > s/blk_queue/disk/ Hmmm... The argument is a gendisk, but it is the request queue that is modified. So following other functions like this, isn't blk_queue_ prefix better ? > >> mutex_lock(&q->sysfs_lock); >> + >> + ret = blk_register_cranges(disk, NULL); >> + if (ret) { >> + mutex_unlock(&q->sysfs_lock); >> + mutex_unlock(&q->sysfs_dir_lock); >> + kobject_del(&q->kobj); >> + blk_trace_remove_sysfs(dev); >> + kobject_put(&dev->kobj); >> + return ret; >> + } >> + >> if (q->elevator) { >> ret = elv_register_queue(q, false); >> if (ret) { >> + blk_unregister_cranges(disk); >> mutex_unlock(&q->sysfs_lock); >> mutex_unlock(&q->sysfs_dir_lock); >> kobject_del(&q->kobj); > > Please use a goto instead of duplicating the code. I had tried that but it made a mess. Will have another look at it. -- Damien Le Moal Western Digital Research