Re: [PATCH RFC 0/2] TCMU: Add TCMU_OP_EXP opcode support

Mike Christie <mchristi@xxxxxxxxxx> · Thu, 5 Jan 2017 15:37:17 -0600

On 01/04/2017 02:53 PM, Andy Grover wrote:
> On 12/30/2016 12:18 AM, lixiubo@xxxxxxxxxxxxxxxxxxxx wrote:
>> From: Xiubo Li <lixiubo@xxxxxxxxxxxxxxxxxxxx>
>>
>> This patch series add TCMU_OP_EXP opcode support for skipping the
>> expired cmd in kernel space and user space.
>>
>> * The 1/2 patch is for target_core_user in kernel space.
>> * The 2/2 patch is for tcmu-runner in user space.
>>
>> Here add one 'void *data' parameter just piont to the entry hdr in
>> ring buffer:
>>     struct tcmu_cmd {
>>         ...
>>     +    void *data; ===> pionting to entry hdr
>>         ...
>>     }
>>
>> When some cmds were expired, target_core_user will set the opcode in
>> the entry hdr to TCMU_OP_EXP(or reuse TCMU_OP_PAD here?), and then
>> the tcmu-runner will just skip all the TCMU_OP_EXP cmds, so is the
>> target_core_user.
> 
> I think this may cause race conditions. Once we "give" part of the cmd
> ring to userspace, it's racy to change it because we don't know if
> userspace has already accessed it, or not.
> 
> We could put the new OP_EXP opcode on the cmd ring as a new entry. This
> would ensure the old cmd's processing was terminated and not finish
> out-of-order with a new cmd the initiator sent to retry the aborted one,
> avoiding corruption. Implementing this could get a little complicated

When tcmu-runner sees the new OP_EXP opcode, what does it do with it? I
was not sure how it prevents retries and new commands from executing
before the expired command. Will tcmu-runner not start new commands
before the OP_EXP is handled?

If you do a HA target, then the initiator could retry IO on other paths,
so I think we would want to abort the command in tcmu-runner and the
handler, before having LIO return status for the command or a TMF for
it. So if OP_EXP means for tcmu-runner and the handlers to kill commands
then it would work for me.

What is the point of the target_core_user timer though? The initiators
already do the same thing, so it seems like one more thing to configure.
For example, for tape commands we will want to increase it, so why
configure it twice. Is it just a hack around target_core_tmr.c not being
able to call into the backend modules to perform TMFs?

If there is a good use for the timer, should it be in userspace? It
seems like it would be easier for tcmu-runner to drive it. We could have
a tcmu-runner timer in userspace, call into the handlers and when they
kill a command we can fail it with a task aborted status or something
like that.

Is it just supposed to detect when tcmu-runner has died?

> because we'd need to handle if the cmd ring was full, and prioritize
> queued OP_EXP opcodes over OP_CMD when ring space became available.
> 
> Mike made me aware of this as an issue recently -- do you have any
> further thoughts for how best to solve this?
> 
> Regards -- Andy
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe target-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html