----- Original Message ----- > From: Dan Williams <dan.j.williams@xxxxxxxxx> > To: Luben Tuikov <ltuikov@xxxxxxxxx> > Cc: "linux-scsi@xxxxxxxxxxxxxxx" <linux-scsi@xxxxxxxxxxxxxxx>; "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>; "JBottomley@xxxxxxxxxxxxx" <JBottomley@xxxxxxxxxxxxx>; Jack Wang <jack_wang@xxxxxxxxx> > Sent: Wednesday, July 27, 2011 1:32 PM > Subject: Re: [PATCH] [SCSI] libsas: remove expander from dev list on error > > On Tue, Jul 26, 2011 at 11:10 PM, Luben Tuikov <ltuikov@xxxxxxxxx> wrote: >> If expander discovery fails (sas_discover_expander()), >> remove the expander from the port device list >> (sas_ex_discover_expander()), before freeing it. Else >> the list is corrupted and, e.g., when we attempt to send >> SMP commands to other devices, the kernel oopses. >> >> Signed-off-by: Luben Tuikov <ltuikov@xxxxxxxxx> >> Reviewed-by: Jack Wang <jack_wang@xxxxxxxxx> >> --- >> drivers/scsi/libsas/sas_expander.c | 3 +++ >> 1 files changed, 3 insertions(+), 0 deletions(-) >> >> diff --git a/drivers/scsi/libsas/sas_expander.c > b/drivers/scsi/libsas/sas_expander.c >> index 874e29d..f84084b 100644 >> --- a/drivers/scsi/libsas/sas_expander.c >> +++ b/drivers/scsi/libsas/sas_expander.c >> @@ -849,6 +849,9 @@ static struct domain_device *sas_ex_discover_expander( >> >> res = sas_discover_expander(child); >> if (res) { >> + spin_lock_irq(&parent->port->dev_list_lock); >> + list_del(&child->dev_list_node); >> + spin_unlock_irq(&parent->port->dev_list_lock); >> kfree(child); >> return NULL; > > Acked-by: Dan Williams <dan.j.williams@xxxxxxxxx> > > ...but it raises a couple questions for potential follow on patches: > > 1/ Do we need to add the device to port->dev_list prior to the > discovery? Seems cleaner to defer the list_add until after the > discovery has succeeded. In my original SAS Transport layer, devices were added to the port device list in one and only one place, after discovery, at sas_kobj_set() (which doesn't exist now because my code was edited off-line before being submitted into the kernel by Bottomley). Thus due to the architecture of the code, this bug doesn't exist in my original implementation, but was introduced in the version submitted by Bottomley, when devices are being added to the port list in more than one place. My development git history was removed as the code was taken off my git trees, edited off line, off git, and then submitted, thus erasing its original state or its development history (in git). So it's now impossible for a 3rd party to see what bugs were introduced due to those off-line changes before submission. And as you can see, there are bugs. The bugs are, as I can see, stemming from the current _explicitness_ of libsas, a la, just-in-place fix, as if someone has been studying the code by line-stepping it with a debugger, and then going "A-ha! Here is where we can stick this in." SATA support seems to have been added in a similar manner, "where can we stick this to get it to work". I.e. there is no grand architecture, no vision. The original code, was much more implicit: you had to pull back to see what was happening. It was designed as a transport layer. For example, for a new SAS controller, you'd only need to write a PCI device driver which exposed the ASIC's particular domain access implementation in a structure of function pointers, that the PCI LLDD fills in, and the SAS Transport Layer (SAS TL) fills in, and this is how both communicate. Then, the SAS TL would register with the SCSI layer. It was a beautiful solution of enterprise quality and design. SATA support was added in 2006 which is _not_ libsata, but also a layer of abstraction with a structure of function pointers, in effect allowing you to register a SATA device on _any_ transport protocol, not only SAS, as a matter of filling in the stubs of a few function pointers. (Thus also allowing you to easily emulate a SATA device if you so desire.) The original SAS TL, also represented the whole SAS domain in sysfs, exactly as it would look like in the physical world and was thus really easy to do "tree -d" and then look at your work bench or rack and see what is connected to what or what is missing or has been added. See this http://marc.info/?l=linux-scsi&m=112629509826900&w=2, for example. The event infrastructure, removed by Bottomley, allowed for an infinite number of events in a finite amount of memory. Removing it, introduced a few bugs, which were I believe fixed back then, but it is impossible to know how many more there are. > 2/ We have unlocked list manipulations in sas_ex_discover_end_dev(), > sas_unregister_common_dev(), and sas_ex_discover_end_dev() Yes, I can see that and that is very unfortunate. And as I mentioned, this doesn't exist in the original design of the code. The point is that after the changes offline people using Linux end up with substandard and mediocre implementation of SAS. Locking and coherency were achieved via the kobj infrastructure which was a beautiful thing at the time, but has seen its number of changes beginning in 06-07 (git can help there). Luben -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html