On 10/5/23 10:56, Jason Gunthorpe wrote: > On Thu, Oct 05, 2023 at 07:50:28AM -0700, Bart Van Assche wrote: >> On 10/5/23 07:21, Jason Gunthorpe wrote: >>> Which is why it shows there are locking problems in this code. >> >> Hi Jason, >> >> Since the locking problems have not yet been root-caused, do you >> agree that it is safer to revert patch "RDMA/rxe: Add workqueue >> support for rxe tasks" rather than trying to fix it? > > I don't think that makes the locking problems go away any more that > using a single threaded work queue? > > Jason This is slightly off topic but may still be relevant. If there are locking bugs they are between the two send side tasks rxe_completer and rxe_requester which share the send queue and other state. Bart attempts to fix this by setting max_active to 1 which limits the ability of these two work queue tasks from interfering. For completely different reasons we have looked at merging these two tasks into a single task which it turns out improves performance, especially in high scale situations where it reduces the number of cpu cores needed to complete work. But even at low scale (1-2 QPs) it helps because of improved caching. It turns out that if the work is mostly sends and writes that there isn't much for the completer to do while if it is mostly reads there isn't much for the requester to do. So combining them doesn't hurt performance by having fewer cores to do the work. But this also prevents the two tasks for a given QP to run at the same time which should eliminate locking issues. If no one hates the idea I can send in our patch that does this. Bob