Re: [RFC v2 1/6] scsi: xarray hctl

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2020-05-25 1:40 p.m., Matthew Wilcox wrote:
On Mon, May 25, 2020 at 12:30:52PM -0400, Douglas Gilbert wrote:
On 2020-05-25 2:57 a.m., Hannes Reinecke wrote:
On 5/24/20 5:58 PM, Douglas Gilbert wrote:
+++ b/drivers/scsi/scsi.c
@@ -554,19 +554,32 @@ EXPORT_SYMBOL(scsi_device_put);
   struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
                          struct scsi_device *prev)
   {
-    struct list_head *list = (prev ? &prev->siblings : &shost->__devices);
-    struct scsi_device *next = NULL;
-    unsigned long flags;
+    struct scsi_device *next = prev;
+    unsigned long flags, l_idx;
       spin_lock_irqsave(shost->host_lock, flags);
-    while (list->next != &shost->__devices) {
-        next = list_entry(list->next, struct scsi_device, siblings);
-        /* skip devices that we can't get a reference to */
-        if (!scsi_device_get(next))
-            break;
+    if (xa_empty(&shost->__devices)) {
           next = NULL;
-        list = list->next;
+        goto unlock;
       }

You don't need this stanza.  xa_find() will do the right thing (ie return
NULL) with an empty array.

Okay.

When using xarray to store a bitset (e.g. say the 9 states in:
   enum scsi_device_state   [include/scsi/scsi_device.h]
as 2**0, 2**1, 2**2 .... 2**8), is there a mask to indicate which bit
positions are used internally by xarray? Or are the full 32 bits available,
other than the special value 0 aka NULL?

+    do {
+        if (!next) {    /* get first element iff first iteration */
+            l_idx = 0;
+            next = xa_find(&shost->__devices, &l_idx,
+                       scsi_xa_limit_16b.max, XA_PRESENT);

You don't need to use a 16-bit limit here; the XArray code will stop after
the end of the array, and the only thing you might save is scanning the top
node unnecessarily far.  To get there, you'd have to have an index >= 2^12,
and the additional limit would mean that you'd scan the first 16 entries of
the top node instead of all 64.  Kind of pales into insignificance compared
to actually walking all 2^12 elements up to that point ;-)


If we go to using parts of the hctl tuple (host, channel, target_id, lun)
as xarray indexes, then large numbers can occur in unstable systems with
say the same group of targets going offline and then back online  when,
for example, a power supply trips out. The number of elements will still
be relatively small, say well less than 2**16.

@@ -391,8 +391,8 @@ static void scsi_single_lun_run(struct
scsi_device *current_sdev)
       spin_lock_irqsave(shost->host_lock, flags);
       if (starget->starget_sdev_user)
           goto out;
-    list_for_each_entry_safe(sdev, tmp, &starget->devices,
-            same_target_siblings) {
+    /* XARRAY: was list_for_each_entry_safe(); is this safe ? */
+    xa_for_each(&starget->devices, l_idx, sdev) {
           if (sdev == current_sdev)
               continue;
           if (scsi_device_get(sdev))

Frankly, I don't know why we're using the _safe variant here.
Typically the _safe variant is used to protect against removal
(ie if a list element is removed while iterating over the list),
but I can't really imagine why starting I/O on the device would
cause one entry to be deleted...
James?

... to add, xa_for_each() is safe against deletion while walking.
Or addition while walking.

@@ -234,8 +234,9 @@ static struct scsi_device
*scsi_alloc_sdev(struct scsi_target *starget,
       sdev->channel = starget->channel;
       mutex_init(&sdev->state_mutex);
       sdev->sdev_state = SDEV_CREATED;
-    INIT_LIST_HEAD(&sdev->siblings);
-    INIT_LIST_HEAD(&sdev->same_target_siblings);
+    /* XARRAY: INIT_LIST_HEAD()s here on list leaves, why ? */
+    sdev->sh_idx = -1;
+    sdev->starg_idx = -1;        /* for debugging */
       INIT_LIST_HEAD(&sdev->starved_entry);
       INIT_LIST_HEAD(&sdev->event_list);
       spin_lock_init(&sdev->list_lock);

It is good practice to call INIT_LIST_HEAD() on list leaves as then you
can call 'list_empty()' to figure out if they are part of a list or not.

But I'm wondering about the 'sh_idx' and the 'starg_idx' here.
We already have perfectly good indices with the 'lun' and 'id' entries
in struct scsi_device, which (incidentally) _need_ to be unique.

So why not use them?

Ah now we have something to discuss and perhaps Willy can help

At the target level we have two indexes:
    - channel (is 16 bits enough)
    - target (16 or 32 bits?)
This could be address to ways:
    1) introduce struct scsi_channel {int num_chans; scsi_target *t1p;}
    2) combine channnel and target into one 32 bit integer.

The xarray type of its index seems to be unsigned long (Willy,
is that correct?). That is the same size as a pointer, 32 bits
on a 32 bit machine, 64 bits on a 64 bit machine. That is
unfortunate, since we can represent the full 64 bit LUN as
a single index on a 32 bit machine. We could 'hide' that and
have two xarrays in scsi_device for 32 bit machines.

You aren't the first person to ask about having a 64-bit lookup on
32-bit machines.  Indeed, I remember Hannes asking for a 256 bit lookup
at LinuxCon in Prague.  I have always been reluctant to add such a thing because the XArray uses quite a naive data type underneath.  It works well with
dense arrays but becomes very wasteful of memory for sparse arrays.

My understanding of SCSI-world is that most devices have a single
LUN 0.  Most devices that have multiple LUNs number them sequentially.
Some devices have either an internal structure or essentially pick random
LUNs for the devices they expose.

Yes an no. There are now all sorts of weird patterns hiding in the
64 bit LUNs. See chapter 4.7 in sam6r05.pdf over at www.t10.org .
And arguably there is a 65th bit with the LU_CONG bit in an INQUIRY
standard response changing the interpretation of the 64 bit LUN.

So to start with, I took the easy approach, the one that most
resembles the what include/linux/list.h has been using. There
is also a (overly ?) complex locking scheme that I really did
not want to upset.

I think this is currently the best way to use the XArray.  I do have
a project in the works to replace the data structure implementing the
XArray API with a better one, but that project has been underway for
almost two years now, so I would have a hard time suggesting that you
wait for it to be ready.

Those num_chans, t1p members that I put in scsi_channel is just
to show the xarray lookup can be bypassed when num_chans=1
which is very likely. Same approach could be taken in scsi_target
to scsi_device.

If you do only have a single entry at 0, the XArray is very efficient.
It simplifies to a single pointer.  You don't need to optimise for that
case yourself ;-)

That is good to know.

@@ -326,6 +331,7 @@ static void scsi_target_dev_release(struct device *dev)
       struct device *parent = dev->parent;
       struct scsi_target *starget = to_scsi_target(dev);
+    xa_destroy(&starget->devices);

By the way, you don't have to call xa_destroy() if you know the array
is empty.  The IDR used to have that unfortunate requirement, and about a
third of users blissfully ignored it, leaking uncounted amounts of memory.

@@ -445,7 +455,15 @@ static struct scsi_target
*scsi_alloc_target(struct device *parent,
       if (found_target)
           goto found;
-    list_add_tail(&starget->siblings, &shost->__targets);
+    /* XARRAY: was list_add_tail(); may get hole in xarray or tail */
+    res = xa_alloc(&shost->__targets, &u_idx, starget, scsi_xa_limit_16b,
+               GFP_ATOMIC);
+    if (res < 0) {
+        spin_unlock_irqrestore(shost->host_lock, flags);
+        pr_err("%s: xa_alloc failure errno=%d\n", __func__, -res);
+        return NULL;
+    }
+    starget->sh_idx = u_idx;
       spin_unlock_irqrestore(shost->host_lock, flags);
       /* allocate and add */
       transport_setup_device(dev);

'scsi_xa_limit_16b' ?
What's that for?

See include/linux/xarray.h xa_limit_32b

This is a 16 bit equivalent. Not sure if that is too restrictive.
Anyway, the xa_alloc() call amongst others needs an upper and
lower bound.

I'm happy to have an xa_limit_16b in xarray.h.  I estimate it would have
about 5-10 users across the entire kernel.

No, that was just experimental.

And, again, if we were to use the target id as an index we could be
using 'xa_store' and the problem would be solved...

It may make sense to use an allocating XArray for LUNs and targets, but
use the channel ID directly.  It depends how structured these things
are in SCSI, and I'll admit I haven't been keeping up with how SCSI
numbers things on a modern system (I suspect 99.9% of SCSI devices
are USB keys with one host, one channel, one target, one LUN, but
that the 0.1% of SCSI devices on Linux systems are responsible for 80%
of the performance end of the market)

Attached is a UML (like) class diagram for folks to kick around. It handles
u32 values for the host, channel and target_id level and u64 at the lun
(leaf) level, via a bit of hackery, even on 32 bit machines. So the object
tree is built on the diagonal, with the multiplicity coming from xarray
members (attributes). The arrowed lines show inheritance, with
struct device being the base class aka superclass (one could argue
otherwise). The "transport_class_<n>"s are added as needed by SCSI
transports such as SAS and iSCSI.

And it uses native xarray indexes up to 2**32 - 1. So it meets all the
constraints we have currently.

Plus, based on the answers I get from above questions, in the
scsi_lu_low32 class I would like to add another xarray, that holds
bitsets rather than pointers. Each bitset holds amongst other things
the 9 scsi_device states. That way you could iterate through all
LU devices looking for an uncommon state _without_ dereferencing
pointers to any sdev_s that are bypassed. And if xarray is cache
efficient in storing its data, then it would be much faster. The
same thing (i.e. bitset xarray) could be done in scsi_channel, for
starget states.

Finally, for efficient xarray use, every object needs to be able to
get to its parent (or at least, its "collection") quickly as that is
where all xarray operations take place. And by efficient I do _not_
mean calling dev_to_shost(starget->dev.parent) when we built the
object hierarchy and _know_ where its "shost" is.

Doug Gilbert

Attachment: scsi_ml_cd1.pdf
Description: Adobe PDF document


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux