Hi Bart, > On Nov 7, 2019, at 12:30 PM, Bart Van Assche <bvanassche@xxxxxxx> wrote: > > External Email > > ---------------------------------------------------------------------- > On 11/7/19 9:58 AM, Bart Van Assche wrote: >> Does your answer mean that this hang has not yet been root-caused fully >> and hence that it is possible this patch is only a workaround but not a >> fix of the root cause? > > Answering my own question: I think that a qpair refcount leak is a severe problem and not something that should be ignored. How about changing the while loop into something like the following: > > if (atomic_read(&qpair->ref_count)) > msleep(500); > WARN_ON_ONCE(atomic_read(&qpair->ref_count)); > > Thanks, > > Bart. Since we had seen this hang in a specific cluster environment and refcount leak was observed, I would like to add this patch as is and will consider your suggestion to verify if adding WARN_ON_ONCE will make any difference. If we discover that adding WARN_ON_ONCE indeed helps, then I will add a patch with fixes tag during rc window. Let me know if you disagree. Thanks, Himanshu