On 2020-05-11 10:41 a.m., James Bottomley wrote:
On Sun, 2020-05-10 at 17:50 -0400, Douglas Gilbert wrote:On 2020-05-10 2:52 p.m., James Bottomley wrote:On Sun, 2020-05-10 at 14:32 -0400, Douglas Gilbert wrote:This gem is in scsi_scan.c in the documentation of that function's first argument. "need not be a scsi host" should read "it damn well better be a scsi host" otherwise that function will crash and burn!It shouldn't: several transport classes, like SAS and FC have intermediate devices between the host and the target and they all work just fine using the non-host parent. Since you don't give the error this is just guesswork, but the host has to be somewhere in the parent chain otherwise dev_to_shost(parent) will return NULL ... is that your problem?May be that "need not be a scsi host" should be expanded to something that suggests dev_to_shost(first_arg) can resolve to a non-NULL pointer.
In my working tree, I've changed that to: * scsi_alloc_target - allocate a new or find an existing target * @parent: may point to the parent shost, or an intermediate object * that dev_to_shost() can resolve to the parent shost * @channel: target channel number (zero if no channels) * @id: target id number Hinting at what is actually happening.
Are there restrictions on those intermediate object(s)? Can there be more than one? If so, can those intermediate objects only form a linear chain or could it be more complex, an object subtree for example?I think you can read the current code as well as I can: static inline struct Scsi_Host *dev_to_shost(struct device *dev) { while (!scsi_is_host_device(dev)) { if (!dev->parent) return NULL; dev = dev->parent; } return container_of(dev, struct Scsi_Host, shost_gendev); } The broad point is that this is open source, not some proprietary OS. I'm not giving you an API set in stone and you have to figure out the use cases, I'm telling you how the current API behaves and the reason why it behaves this way. If there's an expansion or change you need, provided it supports the current uses and is maintainable, it can be done. What you have to tell me is what you're trying to do.
https://marc.info/?l=linux-scsi&m=158880168715701&w=2 One aspect of xarray operations is that they are performed on the collection holder (a scsi_host::__targets in this case), not so much on the collected objects where it has a smaller footprint than list_head. So I would be inclined to add a scsi_target::parent_shost pointer and have a trivial: static inline struct Scsi_Host *starg_to_shost(struct scsi_target *starg) {} accessor.
I'm trying to work out why the function: starget_for_each_device() in scsi.c does _not_ use that collection right in front of it (i.e. scsi_target::devices). Instead, it step up to the host level, and iterates over all devices (LUs) on that host and only calls the given function for those devices that match the channel and target numbers. That is bizarrely wasteful if scsi_target::devices could be iterated over instead.Anyone know why this is?Best guess would be it wasn't converted over when the target list was introduced.Okay, I'll change it and see what breaks.
A qualified success, so far. See below.
If you are not familiar with "mark"s in xarrays, they are binary flags hidden inside the pointers that xarray holds to form a container. With these flags (two available per xarray) one can make iterations much more efficient with xa_for_each_marked() { }. The win is that there is no need to dereference the pointers of collection members that are _not_ marked. After playing with these in my rework of the sg driver, I concluded that the best thing to mark was object states. Unfortunately there are only 2 marks *** per xarray but scsi_device, for example, has 9 states in scsi_device_state. Would you like to hazard a guess of which two (or two groups), if any, are iterated over often enough to make marking worthwhile? Those two marks could be stretched to four, sort of, as scsi_device objects are members of two xarrays in my xarray rework: all sdev objects in a starget, and all sdev objects in a shost.I've got to say that sounds like a solution looking for a problem. The fast path of SCSI is pretty rigorously checked with high iops devices these days. The fact that this iterator is really lock heavy but has never been fingered in any of the traces indicates that it's probably not really a problem that needs solving.
https://www.kernel.org/doc/html/latest/core-api/xarray.html Of xarray it says: "It is more memory-efficient, parallelisable and cache friendly than a doubly-linked list. It takes advantage of RCU to perform lookups without locking." M. Wilcox in the overview of that linked document. The use of list_head objects to hold together the scsi mid-level object tree is a mess, IMO. That makes it hard to maintain and possibly inefficient. The fact that whoever added the scsi_target's collection of scsi_device-s could not see the obvious way to iterate over that collection is an example. list_head based collections (as currently used) take spinlocks on both modifying and non-modifying operations. An interesting aspect of xarrays is that they have integrated locking and take cheaper rcu locks for iterations which are mostly involved with non-modifying operations. That integrated locking is both an advantage (it is well instrumented in the case of xarray) and creates difficulties when retrofitting xarrays in the place of list_head based collections which have "roll your own" type locking. Testing? Yes plenty of it. And things break, just not yet in the scsi subsystem :-) With the attached script, first my laptop goes into thermal throttling, then a few minutes later systemd-udev seems to lose it, spraying blocks of these messages in the log:xtwo70 systemd-udevd[9453]: 4:0:8:15: Failed to process device, ignoring: File exists
and leaking memory at the same time (based on that "age") in the kernel: # cat /sys/kernel/debug/kmemleak unreferenced object 0xffff88815929e000 (size 8192): comm "modprobe", pid 14214, jiffies 4297239717 (age 743.446s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000003f0664f4>] 0xffffffffa0d7c202 [<00000000f06ff621>] do_one_initcall+0x58/0x300 [<00000000ee7e3b9e>] do_init_module+0x57/0x1f0 [<000000000427a735>] load_module+0x2758/0x2930 [<00000000b9da2d49>] __do_sys_finit_module+0xbf/0xe0 [<000000003f60bec7>] do_syscall_64+0x4b/0x1c0 [<000000007cbf2f8c>] entry_SYSCALL_64_after_hwframe+0x49/0xb3 .... < many more of the same > The script does complete and my laptop seems normal. It is hard to verify correctness and any performance gains when systemd and udev are eating most of the processor resources. Doug Gilbert
Attachment:
tst_sdebug_modpr_rmmod.sh
Description: application/shellscript