On Thu, Feb 06, 2020 at 05:56:43PM +0800, Yixian Liu wrote: > Earlier Background: > HiP08 RoCE hardware lacks ability(a known hardware problem) to flush > outstanding WQEs if QP state gets into errored mode for some reason. > To overcome this hardware problem and as a workaround, when QP is > detected to be in errored state during various legs like post send, > post receive etc [1], flush needs to be performed from the driver. > > These data-path legs might get called concurrently from various context, > like thread and interrupt as well (like NVMe driver). Hence, these need > to be protected with spin-locks for the concurrency. This code exists > within the driver. > > Problem: > Earlier The patch[1] sent to solve the hardware limitation explained > in the background section had a bug in the software flushing leg. It > acquired mutex while modifying QP state to errored state and while > conveying it to the hardware using the mailbox. This caused leg to > sleep while holding spin-lock and caused crash. > > Suggested Solution: > In this patch, we have proposed to defer the flushing of the QP in > Errored state using the workqueue. > > We do understand that this might have an impact on the recovery times > as scheduling of the workqueue handler depends upon the occupancy of > the system. Therefore to roughly mitigate this affect we have tried > to use Concurrency Managed workqueue to give worker thread (and > hence handler) a chance to run over more than one core. > > > [1] https://patchwork.kernel.org/patch/10534271/ > > > This patch-set consists of: > [Patch 001] Introduce workqueue based WQE Flush Handler > [Patch 002] Call WQE flush handler in post {send|receive|poll} Applied to for-next Thanks, Jason