Re: [PATCH v3 for-next 0/2] Fix crash due to sleepy mutex while holding lock in post_{send|recv|poll}

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jason,

I want to make sure that is there any further comments on this patch set?

Thanks!
Eason

On 2019/11/21 19:19, Yixian Liu wrote:
> Earlier Background:
> HiP08 RoCE hardware lacks ability(a known hardware problem) to flush
> outstanding WQEs if QP state gets into errored mode for some reason.
> To overcome this hardware problem and as a workaround, when QP is
> detected to be in errored state during various legs like post send,
> post receive etc [1], flush needs to be performed from the driver.
> 
> These data-path legs might get called concurrently from various context,
> like thread and interrupt as well (like NVMe driver). Hence, these need
> to be protected with spin-locks for the concurrency. This code exists
> within the driver.
> 
> Problem:
> Earlier The patch[1] sent to solve the hardware limitation explained
> in the background section had a bug in the software flushing leg. It
> acquired mutex while modifying QP state to errored state and while
> conveying it to the hardware using the mailbox. This caused leg to
> sleep while holding spin-lock and caused crash.
> 
> Suggested Solution:
> In this patch, we have proposed to defer the flushing of the QP in
> Errored state using the workqueue.
> 
> We do understand that this might have an impact on the recovery times
> as scheduling of the wqorkqueue handler depends upon the occupancy of
> the system. Therefore to roughly mitigate this affect we have tried
> to use Concurrency Managed workqueue to give worker thread (and
> hence handler) a chance to run over more than one core.
> 
> 
> [1] https://patchwork.kernel.org/patch/10534271/
> 
> 
> This patch-set consists of:
> [Patch 001] Introduce workqueue based WQE Flush Handler
> [Patch 002] Call WQE flush handler in post {send|receive|poll}
> 
> v3 changes:
> 1. Fall back to dynamically allocate flush_work.
> 
> v2 changes:
> 1. Remove new created workqueue according to Jason's comment
> 2. Remove dynamic allocation for flush_work according to Jason's comment
> 3. Change current irq singlethread workqueue to concurrency management
>    workqueue to ensure work unblocked.
> 
> Yixian Liu (2):
>   RDMA/hns: Add the workqueue framework for flush cqe handler
>   RDMA/hns: Delayed flush cqe process with workqueue
> 
>  drivers/infiniband/hw/hns/hns_roce_device.h |  2 +
>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 88 +++++++++++++----------------
>  drivers/infiniband/hw/hns/hns_roce_qp.c     | 43 ++++++++++++++
>  3 files changed, 85 insertions(+), 48 deletions(-)
> 




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux