On 03/08/2017 03:05 AM, Nicholas A. Bellinger wrote: > Hi MNC, > > I know Andy has stepped back from TCMU, but I wanted to get his input > (CC'ed) as the original author. > > On Wed, 2017-03-01 at 23:14 -0600, Mike Christie wrote: >> This patch removes the tcmu the cmd timeout code. Like with the core >> target_core_tmr.c code, the kernel does not know the state of the command >> or implemenation details of the device, so it might not be safe to just >> return the command to the initiator for a possible retry. >> >> Signed-off-by: Mike Christie <mchristi@xxxxxxxxxx> > > Applied for the moment, but I'm curious wrt the implications for this..? > > Does this mean all outstanding se_cmd descriptors dispatched into > user-space must always complete outstanding I/Os, otherwise target-core > will block indefinitely waiting for completion a la other in-kernel > backend drivers..? Yes. > > Historically we've always considered a backend device that never > completes outstanding se_cmd descriptors to be a buggy driver. > But existing user-space code that uses TCMU (I assume) is allowed to > restart, and is currently not responsible for completing I/Os after the > restart..? Is that the case..? > > I wonder if this change may end up being problematic because of this > assumption by user-space..? I think we need a module param or per device flag. There is this kernel timeout code that handles the easy part where we can get se_cmd descriptors completed. There is no tcmu-runner restart code to figure out the state of those commands. For example if we were in the WRITE part of a COMPARE_AND_WRITE and the WRITE is floating around at the time of the tcmu-runner crash, then when it starts up again, we would just allow 2 COMPARE_AND_WRITEs to be executing at the same time. However, I see your point that other daemons could be implementing the needed cleanup and be safe. How about a modparam that specifies the timeout length. If set to 0, then we do not timeout internally. tcmu-runner would set this. The default is 30 like it is currently. My tcmu-runner plan was: 1. For hung commands, we are going to add callouts to the backend module for TMFs. target_core_tmr.c would call them during TMFs like ABORT_TASK to have the backend perform the TMF. target_core_user would then call into tcmu-runner and that would call into the userspace handler. 2. For tcmu-runner restarts, I am working on the ability to block backends, so at restart time tcmur-runner can loop over the oustanding commands and call into handlers to figure out if they are running still. If they are, I am debating what to do still. We could have tcmu runner will call the handler callouts used for #1 if needed. -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html