RE: [PATCH 1/1] Revert "RDMA/rxe: Add workqueue support for rxe tasks"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 9, 2023 1:02 AM Zhu Yanjun wrote:
> 在 2023/10/5 22:50, Bart Van Assche 写道:
> > On 10/5/23 07:21, Jason Gunthorpe wrote:
> >> Which is why it shows there are locking problems in this code.
> >
> > Hi Jason,
> >
> > Since the locking problems have not yet been root-caused, do you
> > agree that it is safer to revert patch "RDMA/rxe: Add workqueue
> > support for rxe tasks" rather than trying to fix it?
> 
> Hi, Jason && Leon
> 
> I spent a lot of time on this problem. It seems that it is a very
> difficult problem.
> 
> So I agree with Bart. Can we revert patch "RDMA/rxe: Add workqueue
> support for rxe tasks" rather than trying to fix it? Then Bob can apply
> his new patch to a stable RXE?

Cf. https://lore.kernel.org/lkml/f15b06b934aa0ace8b28dc046022e5507458eb99.1694153251.git.matsuda-daisuke@xxxxxxxxxxx/
I have ODP patches that is fully dependent on "RDMA/rxe: Add workqueue
support for rxe tasks". So I personally prefer preserving workqueue to reverting
the workqueue patch.

Each developer here has different motive and interest. I think the rxe driver should
take in new specs and new features actively so that it can be used by developers
without access to HCAs. I believe workqueue is better suited for this purpose.
Additionally, the disadvantages of tasklet are documented as follows:
https://lwn.net/Articles/830964/
However, stability is very important, so I will not insist on my opinion.

I agree it is very difficult to find the root cause of the locking problem. It cannot
be helped that we will somehow hide the issue for now so that it will not bother
actual users of the driver. Perhaps, there are three choices to do this.

Solution 1: Reverting "RDMA/rxe: Add workqueue support for rxe tasks"
I see this is supported by Zhu, Bart and approved by Leon.

Solution 2: Serializing execution of work items
> -       rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE);
> +       rxe_wq = alloc_workqueue("rxe_wq", WQ_HIGHPRI | WQ_UNBOUND, 1);

Solution 3: Merging requester and completer (not yet submitted/tested)
https://lore.kernel.org/all/93c8ad67-f008-4352-8887-099723c2f4ec@xxxxxxxxx/
Not clear to me if we should call this a new feature or a fix.
If it can eliminate the hang issue, it could be an ultimate solution.

It is understandable some people do not want to wait for solution 3 to be submitted and verified.
Is there any problem if we adopt solution 2?
If so, then I agree to going with solution 1.
If not, solution 2 is better to me.

Thanks,
Daisuke Matsuda





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux