On Thu, May 28, 2020 at 01:50:02PM -0400, Douglas Gilbert wrote: > On 2020-05-28 12:36 p.m., Hannes Reinecke wrote: > > Use xarray for device lookup by target. LUNs below 256 are linear, > > and can be used directly as the index into the xarray. > > LUNs above 256 have a distinct LUN format, and are not necessarily > > linear. They'll be stored in indices above 256 in the xarray, with > > the next free index in the xarray. I don't understand why you think this is an improvement over just using an allocating XArray for all LUNs. It seems like more code, and doesn't actually save you any storage space ... ? > > +++ b/drivers/scsi/scsi.c > > @@ -601,13 +601,14 @@ static struct scsi_target *__scsi_target_lookup(struct Scsi_Host *shost, > > void starget_for_each_device(struct scsi_target *starget, void *data, > > void (*fn)(struct scsi_device *, void *)) > > { > > - struct Scsi_Host *shost = dev_to_shost(starget->dev.parent); > > struct scsi_device *sdev; > > + unsigned long lun_id = 0; > > - shost_for_each_device(sdev, shost) { > > - if ((sdev->channel == starget->channel) && > > - (sdev->id == starget->id)) > > - fn(sdev, data); > > + xa_for_each(&starget->devices, lun_id, sdev) { > > + if (scsi_device_get(sdev)) > > + continue; > > + fn(sdev, data); > > + scsi_device_put(sdev); > > } > > } I appreciate you're implementing the API that's currently in use, but I would recommend open-coding this. Looking at the functions called, lots of them are two-three liners, eg: static void __scsi_report_device_reset(struct scsi_device *sdev, void *data) { sdev->was_reset = 1; sdev->expecting_cc_ua = 1; } which would just be more readable: if (rtn == SUCCESS) { struct scsi_target *starget = scsi_target(scmd->device); struct scsi_device *sdev; unsigned long index; spin_lock_irqsave(host->host_lock, flags); xa_for_each(&starget->devices, index, sdev) { sdev->was_reset = 1; sdev->expecting_cc_ua = 1; } spin_unlock_irqrestore(host->host_lock, flags); } And then maybe you could make a convincing argument that the spin_lock there could be turned into an xa_lock since that will prevent sdevs from coming & going during the iteration of the device list. > > @@ -1621,7 +1623,18 @@ void scsi_sysfs_device_initialize(struct scsi_device *sdev) > > transport_setup_device(&sdev->sdev_gendev); > > spin_lock_irqsave(shost->host_lock, flags); > > - list_add_tail(&sdev->same_target_siblings, &starget->devices); > > + if (sdev->lun < 256) { > > + sdev->lun_idx = sdev->lun; > > + WARN_ON(xa_insert(&starget->devices, sdev->lun_idx, > > + sdev, GFP_KERNEL) < 0); > > You have interrupts masked again, so GFP_KERNEL is a non-no. Actually, I would say this is a great place to not use the host_lock. + int err; ... + if (sdev->lun < 256) { + sdev->lun_idx = sdev->lun; + err = xa_insert_irq(&starget->devices, sdev->lun_idx, sdev, + GFP_KERNEL); + } else { + err = xa_alloc_irq(&starget->devices, &sdev->lun_idx, sdev, + scsi_lun_limit, GFP_KERNEL); + } + ... maybe you want to convert scsi_sysfs_device_initialize() + to return an errno, although honestly this should never fail ... spin_lock_irqsave(shost->host_lock, flags); - list_add_tail(&sdev->same_target_siblings, &starget->devices); list_add_tail(&sdev->siblings, &shost->__devices); spin_unlock_irqrestore(shost->host_lock, flags); > > + unsigned int lun_idx; /* Index into target device xarray */ > > Xarray gives you _two_ choices for the type of an index :-) > It is either u32 (as used in xa_alloc() and its variants) _or_ > unsigned long (as used everywhere else in xarray where an index > is needed). So it's definitely 32 bits for xa_alloc() and either > 32 or 64 bits for the rest of xarray depending on the machine > architecture. > Matthew Wilcox may care to explain why this is ... Saving space in data structures, mostly. It's not really useful to have 4 billion things in an allocating XArray. Indeed, it's not really useful to have 4 billion of almost anything. So we have XArrays which are indexed by something sensible like page index in file, and they're nice and well-behaved (most people don't create files in excess of 16TB, and those who do don't expect amazing performance when seeking around in them at random because they've lost all caching effects). And then you have people who probably want one scsi device. Maybe a dozen. Possibly a thousand. Definitely not a million. If you get 4 billion scsi devices, something's gone very wrong. You've run out of drive letters, for a start (/dev/sdmzabcd anybody?) It doesn't cost us anything to just use unsigned long as the index of an XArray, due to how it's constructed. Except in this one place where we need to point to somewhere to store the index so that we can update the index within an sdev before anybody iterating the data structure under RCU can find the uninitialised index. This patch seems decent enough ... I just think the decision to optimise the layout for LUNs < 256 is a bit odd. But hey, your subsystem, not mine.