Re: [PATCH 3/3] tcmu: remove cmd timeout code

Mike Christie <mchristi@xxxxxxxxxx> · Wed, 8 Mar 2017 12:02:58 -0600

On 03/08/2017 03:05 AM, Nicholas A. Bellinger wrote:
> Hi MNC,
> 
> I know Andy has stepped back from TCMU, but I wanted to get his input
> (CC'ed) as the original author.
> 
> On Wed, 2017-03-01 at 23:14 -0600, Mike Christie wrote:
>> This patch removes the tcmu the cmd timeout code. Like with the core
>> target_core_tmr.c code, the kernel does not know the state of the command
>> or implemenation details of the device, so it might not be safe to just
>> return the command to the initiator for a possible retry.
>>
>> Signed-off-by: Mike Christie <mchristi@xxxxxxxxxx>
> 
> Applied for the moment, but I'm curious wrt the implications for this..?
> 
> Does this mean all outstanding se_cmd descriptors dispatched into
> user-space must always complete outstanding I/Os, otherwise target-core
> will block indefinitely waiting for completion a la other in-kernel
> backend drivers..? 

Yes.

> 
> Historically we've always considered a backend device that never
> completes outstanding se_cmd descriptors to be a buggy driver.
> But existing user-space code that uses TCMU (I assume) is allowed to
> restart, and is currently not responsible for completing I/Os after the
> restart..?  Is that the case..?
> 
> I wonder if this change may end up being problematic because of this
> assumption by user-space..?

I think we need a module param or per device flag.

There is this kernel timeout code that handles the easy part where we
can get se_cmd descriptors completed.

There is no tcmu-runner restart code to figure out the state of those
commands. For example if we were in the WRITE part of a
COMPARE_AND_WRITE and the WRITE is floating around at the time of the
tcmu-runner crash, then when it starts up again, we would just allow 2
COMPARE_AND_WRITEs to be executing at the same time.

However, I see your point that other daemons could be implementing the
needed cleanup and be safe.

How about a modparam that specifies the timeout length. If set to 0,
then we do not timeout internally. tcmu-runner would set this. The
default is 30 like it is currently.

My tcmu-runner plan was:

1. For hung commands, we are going to add callouts to the backend module
for TMFs. target_core_tmr.c would call them during TMFs like ABORT_TASK
to have the backend perform the TMF. target_core_user would then call
into tcmu-runner and that would call into the userspace handler.

2. For tcmu-runner restarts, I am working on the ability to block
backends, so at restart time tcmur-runner can loop over the oustanding
commands and call into handlers to figure out if they are running still.
If they are, I am debating what to do still. We could have tcmu runner
will call the handler callouts used for #1 if needed.

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html