On Mon, Mar 14, 2022 at 10:52 AM Max Gurtovoy <mgurtovoy@xxxxxxxxxx> wrote: > > > On 3/14/2022 3:57 PM, David Jeffery wrote: > > On Sun, Mar 13, 2022 at 5:59 AM Max Gurtovoy <mgurtovoy@xxxxxxxxxx> wrote: > >> Hi David, > >> > >> thanks for the report. > >> > >> Please check how we fixed that in NVMf in Sagi's commit: > >> > >> nvmet-rdma: fix possible bogus dereference under heavy load (commit: > >> 8407879c4e0d77) > >> > >> Maybe this can be done in isert and will solve this problem in a simpler > >> way. > >> > >> is it necessary to change max_cmd_sn ? > >> > >> > > Hello, > > > > Sure, there are alternative methods which could fix this immediate > > issue. e.g. We could make the command structs for scsi commands get > > allocated from a mempool. Is there a particular reason you don't want > > to do anything to modify max_cmd_sn behavior? > > according to the description the command was parsed successful and sent > to the initiator. > Yes. > Why do we need to change the window ? it's just a race of putting the > context back to the pool. > > And this race is rare. > Sure, it's going to be rare. Systems using isert targets with infiniband are going to be naturally rare. It's part of why I left the max_cmd_sn behavior untouched for non-isert iscsit since they seem to be fine as is. But it's easily and regularly triggered by some systems which use isert, so worth fixing. > > > > I didn't do something like this as it seems to me to go against the > > intent of the design. It makes the iscsi window mostly meaningless in > > some conditions and complicates any allocation path since it now must > > gracefully and sanely handle an iscsi_cmd/isert_cmd not existing. I > > assume special commands like task-management, logouts, and pings would > > need a separate allocation source to keep from being dropped under > > memory load. > > it won't be dropped. It would be allocated dynamically and freed > (instead of putting it back to the pool). > If it waits indefinitely for an allocation it ends up with a variation of the original problem under memory pressure. If it waits for allocation on isert receive, then receive stalls under memory pressure and won't process the completions which would have released the other iscsi_cmd structs just needing final acknowledgement. David Jeffery