Re: Discussion: soft unbinding

Stefan Richter <stefanr@xxxxxxxxxxxxxxxxx> · Sun, 04 May 2008 12:53:38 +0200

Alan Stern wrote:
On Sat, 3 May 2008, James Bottomley wrote:
[...]
At the beginning
of the hotplug debate it was thought there was value in a wait for
unplug event ... some PCI busses have a little button you push and then
a light lights up to tell you everything's OK and you can remove the
card.

After a lot of back and forth, it was decided that the best thing for
the latter was for userland to quiesce and unmount the filesystem,
application or whatever and then tell the kernel it was gone, so in that
scenario, the two paths were identical.  I don't think anything's really
changed in that regard.

I still don't understand.  Let's say the user does unmount the
filesystem and tell the kernel it is gone.  So the LLD calls
scsi_unregister_host() and from that point on fails every call to
queuecommand.  Then how does sd transmit its final FLUSH CACHE command
to the device?  Are you saying that it doesn't need to, since
unmounting the filesystem will cause a FLUSH CACHE to be sent anyway?

Before a device can be safely detached, there may be other things that 
need to be done besides what umount implies.  But let's have a look at 
the grander picture.

I see the following levels at which userspace can initiate detachment:

   1. Close block device files/ character device files.  E.g. umount
      filesystems.  Since userspace is multiprocess/ multithreaded,
      it has no way to prevent new open()s though.

      IOW userspace is unable to say which particular close() is the
      final one.  Or am I missing something?

   2. Unbind the command set driver (SCSI ULD) from the logical unit
      representation.

      How does 2 relate to 1?  Obviously, open() is guaranteed to be
      impossible after 2.

      Note, nothing prevents step 2 to be performed before step 1.
      IOW it is possible to unbind the ULD while the corresponding
      device file is still open, e.g. a filesystem still mounted.

      Furthermore, step 2 involves the execution of some request for
      purposes like flush write cache, stop motor, unlock drive door.
      These requests are dependent on device type and should be
      configurable by userspace to some degree (e.g. whether to go
      into a low power state if in single initiator mode).  The
      command set driver can ensure that these finalizing requests are
      executed in the desired order.  The sg driver sticks out here in
      so far as it has no knowledge of the device type, hence does not
      emit finalizing requests.

   3. Unbind the transport layer driver from the target port
      representation.

      How does 3 relate to 2?  Step 3 will cause step 2 be performed.
      But depending on which SCSI low-level API calls are used, the
      ULD may be unable to get the finalizing requests of step 2
      through the SCSI core to the LLD, because a core-internal
      state variable may prevent it.  The API documentation is
      unclear about it, IOW the behavior is basically undefined.

   4. Unbind the interconnect layer driver from what corresponded to
      the initiator port.

      Some drivers don't implement 3 and 4 separately.

For the discussion here it is obviously crucial how we want 2 relate to 
1 and how we want 3 relate to 2.

The relationship between 4 and 3 is an extension of the issue and 
interesting for hotpluggable PCI, CardBus, ExpressCard and the likes. 
But unlike 3/2 and 2/1, LLD authors have full control over this since 
the SCSI core is not in the picture here (if we treat the "transport 
attributes" programs as parts of the LLDs, not part of the SCSI core).

Side note:  There are various reference counters involved in the layers 
and partially across the layers.  There is for example the module 
reference count of the LLD which is usually (among else) manipulated 
when the device files of ULDs are open()ed and close()d.  A side effect 
is that module unloading as a special case of unbinding is prevented by 
upper layers as long as the upper layers have business with the device. 
But for now this is only a side effect while the actual purpose of these 
reference counters is really only to prevent dereferencing invalid pointers.

Or let's put it the other way around.  Suppose the LLD doesn't start
failing calls to queuecommand until after scsi_unregister_host() 
returns.  Then what about the commands that were in flight when 
scsi_unregister_host() was called?  The LLD thinks it owns them, and 
the midlayer thinks that _it_ owns them and can unilaterally cancel 
them.  They can't both be right.

Is there an actual problem?  As soon as a scsi_cmnd reached 
.queuecommand(), it is the sole privilege and responsibility of the LLD 
to tell when the scmd is complete from the transport's point of view. 
The SCSI core can at this point ask the LLD to prematurely complete an 
scmd, e.g. by means of .eh_abort_handler().

In my opinion, the LLD should simply process all scmds which it gets by 
.queuecommand() independently of whether unbinding was initiated.  I.e. 
complete them successfully if possible, complete them with failure if 
something went wrong at the transport protocol level, complete them as 
aborted when .eh_abort_handler() and friends requested it.

The SCSI core's low-level API should have guarantees somewhere that 
.queuecommand() will not be called anymore after certain 
scsi_remove_XYZ() calls returned.

Furthermore, I would like it if the SCSI core would allow step 2 to be 
performed as gracefully as possible (i.e. with successful execution of 
all finalizing requests which the ULDs emit) --- either in case of all 
scsi_remove_XYZ()s, or only in case of some possibly new 
scsi_remove_ABC()s if the necessary change/clarification of semantics of 
existing scsi_remove_XYZ() is too problematic for some existing LLDs.
--
Stefan Richter
-=====-==--- -=-= --=--
http://arcgraph.de/sr/
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html