Hi David,
thanks for the report.
Please check how we fixed that in NVMf in Sagi's commit:
nvmet-rdma: fix possible bogus dereference under heavy load
(commit:
8407879c4e0d77)
Maybe this can be done in isert and will solve this problem
in
a simpler
way.
is it necessary to change max_cmd_sn ?
Hello,
Sure, there are alternative methods which could fix this
immediate
issue. e.g. We could make the command structs for scsi
commands
get
allocated from a mempool. Is there a particular reason you
don't
want
to do anything to modify max_cmd_sn behavior?
according to the description the command was parsed successful
and
sent
to the initiator.
Yes.
Why do we need to change the window ? it's just a race of
putting
the
context back to the pool.
And this race is rare.
Sure, it's going to be rare. Systems using isert targets with
infiniband are going to be naturally rare. It's part of why I
left
the
max_cmd_sn behavior untouched for non-isert iscsit since they
seem to
be fine as is. But it's easily and regularly triggered by some
systems
which use isert, so worth fixing.
I didn't do something like this as it seems to me to go
against
the
intent of the design. It makes the iscsi window mostly
meaningless in
some conditions and complicates any allocation path since it
now
must
gracefully and sanely handle an iscsi_cmd/isert_cmd not
existing.
I
assume special commands like task-management, logouts, and
pings
would
need a separate allocation source to keep from being dropped
under
memory load.
it won't be dropped. It would be allocated dynamically and
freed
(instead of putting it back to the pool).
If it waits indefinitely for an allocation it ends up with a
variation
of the original problem under memory pressure. If it waits for
allocation on isert receive, then receive stalls under memory
pressure
and won't process the completions which would have released the
other
iscsi_cmd structs just needing final acknowledgement.
If your system is under such memory pressure can you can't allocate
few
bytes for isert response, the silent drop
of the command is your smallest problem. You need to keep the system
from crashing. And we do that in my suggestion.
David Jeffery
Folks this is a pending issue stopping a customer from making
progress.
They run Oracle and very high workloads on EDR 100 so David fixed
this
fosusing on the needs of the isert target changes etc.
Are you able to give us technical reasons why David's patch is not
suitable and why we he would have to start from scratch.
You shouldn't start from scratch. You did all the investigation and
the
debugging already.
Coding a solution is the small part after you understand the root
cause.
We literally spent weeks on this and built another special lab for
fully testing EDR 100.
This issue was pending in a BZ for some time and Mellnox had eyes
on it
then but this latest suggestion was never put forward in that BZ to
us.
Mellanox maintainers saw this issue few days before you sent it
upstream. I suggested sending it upstream and have a discussion here
since it has nothing to do with Mellanox adapters and Mellanox SW
stack
MLNX_OFED.
Our job as maintainers and reviewers in the community is to see the
big
picture and suggest solutions that not always same as posted in the
mailing list.
Sincerely
Laurence
Hi Max
Hey,
The issue was reported with the OFED stack at the customer, so its why
we opened the BZ to get the Mallnox partners engineers engaged.
We had them then see if it also existed with the inbox stack which it
did.
Sergey worked a little bit on the issue but did not have the same
suggestion you provivided and asked David for help.
I think you can move the corporate discussions offline.
We will be happy to take the fix you propose doing it your way. May I
that the engineewrs to work on this the most and understand the code
best propose a fix your way.
I tend to agree with Max, I looked into the patch and I can't say that
we know for a fact that incrementing the cmdsn after releasing the
iscsi cmd will not introduce anything else (although looks fine at
a high-level).
Is there any measure-able performance implication?
Max, doing dynamic allocation is also a valid fix.