Re: scsi_alloc_target: parent of the target (need not be a scsi host)

Douglas Gilbert <dgilbert@xxxxxxxxxxxx> · Mon, 11 May 2020 14:31:26 -0400

On 2020-05-11 10:41 a.m., James Bottomley wrote:
On Sun, 2020-05-10 at 17:50 -0400, Douglas Gilbert wrote:
On 2020-05-10 2:52 p.m., James Bottomley wrote:
On Sun, 2020-05-10 at 14:32 -0400, Douglas Gilbert wrote:
This gem is in scsi_scan.c in the documentation of that
function's first argument. "need not be a scsi host" should read
"it damn well better be a scsi host" otherwise that function will
crash and burn!

It shouldn't: several transport classes, like SAS and FC have
intermediate devices between the host and the target and they all
work just fine using the non-host parent.  Since you don't give the
error this is just guesswork, but the host has to be somewhere in
the parent chain otherwise dev_to_shost(parent) will return NULL
... is that your problem?

May be that "need not be a scsi host" should be expanded to something
that suggests dev_to_shost(first_arg) can resolve to a non-NULL
pointer.

In my working tree, I've changed that to:

* scsi_alloc_target - allocate a new or find an existing target
 * @parent:     may point to the parent shost, or an intermediate object
 *              that dev_to_shost() can resolve to the parent shost
 * @channel:    target channel number (zero if no channels)
 * @id:         target id number

Hinting at what is actually happening.

Are there restrictions on those intermediate object(s)? Can there be
more than one? If so, can those intermediate objects only form a
linear chain or could it be more complex, an object subtree for
example?

I think you can read the current code as well as I can:

static inline struct Scsi_Host *dev_to_shost(struct device *dev)
{
	while
(!scsi_is_host_device(dev)) {
		if (!dev->parent)
			return NULL;
		dev =
dev->parent;
	}
	return container_of(dev, struct Scsi_Host,
shost_gendev);
}

The broad point is that this is open source, not some proprietary OS.
I'm not giving you an API set in stone and you have to figure out the
use cases, I'm telling you how the current API behaves and the reason
why it behaves this way.  If there's an expansion or change you need,
provided it supports the current uses and is maintainable, it can be
done.  What you have to tell me is what you're trying to do.

https://marc.info/?l=linux-scsi&m=158880168715701&w=2

One aspect of xarray operations is that they are performed on the
collection holder (a scsi_host::__targets in this case), not so much on
the collected objects where it has a smaller footprint than list_head.
So I would be inclined to add a scsi_target::parent_shost pointer and
have a trivial:
  static inline struct Scsi_Host *starg_to_shost(struct scsi_target *starg) {}
accessor.

I'm trying to work out why the function:
starget_for_each_device() in scsi.c does _not_ use that
collection right in front of it (i.e. scsi_target::devices).
Instead, it step up to the host level, and iterates over all
devices (LUs) on that host and only calls the given function for
those devices that match the channel and target numbers.
That is bizarrely wasteful if scsi_target::devices could be
iterated over instead.

Anyone know why this is?

Best guess would be it wasn't converted over when the target list
was introduced.

Okay, I'll change it and see what breaks.

A qualified success, so far. See below.

If you are not familiar with "mark"s in xarrays, they are binary
flags hidden inside the pointers that xarray holds to form a
container. With these flags (two available per xarray) one can make
iterations much more efficient with xa_for_each_marked() { }. The win
is that there is no need to dereference the pointers of collection
members that are _not_ marked. After playing with these in my rework
of the sg driver, I concluded that the best thing to mark was object
states. Unfortunately there are only 2 marks *** per xarray but
scsi_device, for example, has 9 states in scsi_device_state. Would
you like to hazard a guess of which two (or two groups), if any, are
iterated over often enough to make marking worthwhile?

Those two marks could be stretched to four, sort of, as scsi_device
objects are members of two xarrays in my xarray rework: all sdev
objects in a starget, and all sdev objects in a shost.

I've got to say that sounds like a solution looking for a problem.

The fast path of SCSI is pretty rigorously checked with high iops
devices these days.  The fact that this iterator is really lock heavy
but has never been fingered in any of the traces indicates that it's
probably not really a problem that needs solving.

https://www.kernel.org/doc/html/latest/core-api/xarray.html

Of xarray it says:
"It is more memory-efficient, parallelisable and cache friendly than
a doubly-linked list. It takes advantage of RCU to perform lookups
without locking."  M. Wilcox in the overview of that linked document.

The use of list_head objects to hold together the scsi mid-level
object tree is a mess, IMO. That makes it hard to maintain and possibly
inefficient. The fact that whoever added the scsi_target's collection
of scsi_device-s could not see the obvious way to iterate over that
collection is an example.

list_head based collections (as currently used) take spinlocks on
both modifying and non-modifying operations. An interesting aspect
of xarrays is that they have integrated locking and take cheaper
rcu locks for iterations which are mostly involved with non-modifying
operations. That integrated locking is both an advantage (it is
well instrumented in the case of xarray) and creates difficulties
when retrofitting xarrays in the place of list_head based collections
which have "roll your own" type locking.

Testing? Yes plenty of it. And things break, just not yet in the scsi
subsystem :-)  With the attached script, first my laptop goes into
thermal throttling, then a few minutes later systemd-udev seems
to lose it, spraying blocks of these messages in the log:
  xtwo70 systemd-udevd[9453]: 4:0:8:15: Failed to process device, ignoring: 
File exists

and leaking memory at the same time (based on that "age") in the kernel:
# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff88815929e000 (size 8192):
   comm "modprobe", pid 14214, jiffies 4297239717 (age 743.446s)
   hex dump (first 32 bytes):
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
   backtrace:
     [<000000003f0664f4>] 0xffffffffa0d7c202
     [<00000000f06ff621>] do_one_initcall+0x58/0x300
     [<00000000ee7e3b9e>] do_init_module+0x57/0x1f0
     [<000000000427a735>] load_module+0x2758/0x2930
     [<00000000b9da2d49>] __do_sys_finit_module+0xbf/0xe0
     [<000000003f60bec7>] do_syscall_64+0x4b/0x1c0
     [<000000007cbf2f8c>] entry_SYSCALL_64_after_hwframe+0x49/0xb3
.... < many more of the same >

The script does complete and my laptop seems normal.
It is hard to verify correctness and any performance gains when systemd
and udev are eating most of the processor resources.

Doug Gilbert

Attachment:
tst_sdebug_modpr_rmmod.sh

Description: application/shellscript