Re: [PATCH v2 19/36] target: Make ABORT and LUN RESET handling synchronous

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Wed, 08 Feb 2017 11:06:17 -0800

On Wed, 2017-02-08 at 18:16 +0000, Bart Van Assche wrote:
> On Tue, 2017-02-07 at 19:11 -0800, Nicholas A. Bellinger wrote:
> > Can you elaborate a bit on how exactly you've been testing these changes
> > for the first order (handle ABORT_TASKs and LUN_RESET with outstanding
> > backend I/O) and second order (handle session shutdown while order one
> > is active) issues mentioned earlier..?
> 
> Previous tests were run against a fileio backend.

That would explain why the basic first order functionality didn't work,
because the code-paths that where changed was never actually tested.

>  That's probably why my
> tests passed despite the circular wait in the TMF code (that has already
> been solved BTW).

Yeah, already looked at your change.  It's a hack.

Adding a second atomic_t emulating a kref on top of the existing
se_cmd->cmd_kref, and then doing a complete_all() for the special TMR
path in the normal fast-path is not going to be acceptable.

Like I said, I'm all for improvements in the TMR area but you'll need to
be alot more methodical about how your proposing the changes, and how
your QA team (manual or automation) are verifying the changes.

>  Anyway, the tests I run to verify the TMF code are:
> 1. The libiscsi unit tests.
> 2. Running fio with a high queue depth and data verification enabled in
>    one shell and the following code in a second shell:
> 
> while true; do sg_reset -d /dev/sd...; sleep .1; echo -n .; done
> 

How many individual occurrences of the first order issue (ABORT_TASK +
LUN_RESET), second order issue (session resinstatement while first order
is active), and third order issue (configfs shutdown while second order
is active) have to tested..?

How many total volumes, nodes are you using..?

> > Once the test is completed, perform an explicit fabric logout and target
> > shutdown.  This is to ensure there are no reference leaks, kernel hung
> > tasks in un-interruptible sleep, or other memory leaks that have been
> > introduced by the series.
> 
> I run all my tests against a kernel with kmemleak enabled. That's a very
> effective way for detecting memory leaks.

Can you give an idea of how many dedicated QA resources that Sandisk is
putting on upstream target development..?

Do you have production customers using it..?

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html