On Tue, May 23, 2023 at 1:56 PM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote: > > > > On 5/23/23 13:52, Zhu Yanjun wrote: > > On Tue, May 23, 2023 at 1:44 PM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote: > >> > >> > >> On 5/23/23 13:18, Zhu Yanjun wrote: > >>> On Tue, May 23, 2023 at 1:08 PM Zhu Yanjun <zyjzyj2000@xxxxxxxxx> wrote: > >>>> On Tue, May 23, 2023 at 12:29 PM Zhu Yanjun <zyjzyj2000@xxxxxxxxx> wrote: > >>>>> On Tue, May 23, 2023 at 12:10 PM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote: > >>>>>> > >>>>>> On 5/23/23 12:02, Zhu Yanjun wrote: > >>>>>>> On Tue, May 23, 2023 at 11:47 AM Zhu Yanjun <zyjzyj2000@xxxxxxxxx> wrote: > >>>>>>>> On Tue, May 23, 2023 at 10:26 AM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote: > >>>>>>>>> On 5/23/23 10:13, syzbot wrote: > >>>>>>>>>> Hello, > >>>>>>>>>> > >>>>>>>>>> syzbot tried to test the proposed patch but the build/boot failed: > >>>>>>>>>> > >>>>>>>>>> failed to apply patch: > >>>>>>>>>> checking file drivers/infiniband/sw/rxe/rxe_qp.c > >>>>>>>>>> patch: **** unexpected end of file in patch > >>>>>>>> This is not the root cause. The fix is not good. > >>>>>>> This problem is about "INFO: trying to register non-static key. The > >>>>>>> code is fine but needs lockdep annotation, or maybe" > >>>>> This warning is from "lock is not initialized". This is a > >>>>> use-before-initialized problem. > >>>>> The correct fix is to initialize the lock that is complained before it is used. > >>>>> > >>>>> Zhu Yanjun > >>>> Based on the call trace, the followings are the order of this call trace. > >>>> > >>>> 291 /* called by the create qp verb */ > >>>> 292 int rxe_qp_from_init(struct rxe_dev *rxe, struct rxe_qp *qp, > >>>> struct rxe_pd *pd, > >>>> 297 { > >>>> ... > >>>> 317 rxe_qp_init_misc(rxe, qp, init); > >>>> ... > >>>> 322 > >>>> 323 err = rxe_qp_init_resp(rxe, qp, init, udata, uresp); > >>>> 324 if (err) > >>>> 325 goto err2; <--- error > >>>> > >>>> ... > >>>> > >>>> 334 err2: > >>>> 335 rxe_queue_cleanup(qp->sq.queue); <--- Goto here > >>>> 336 qp->sq.queue = NULL; > >>>> > >>>> In rxe_qp_init_resp, the error occurs before skb_queue_head_init. > >>>> So this call trace appeared. > >>> 250 static int rxe_qp_init_resp(struct rxe_dev *rxe, struct rxe_qp *qp, > >>> 254 { > >>> ... > >>> 264 > >>> 265 type = QUEUE_TYPE_FROM_CLIENT; > >>> 266 qp->rq.queue = rxe_queue_init(rxe, &qp->rq.max_wr, > >>> 267 wqe_size, type); > >>> 268 if (!qp->rq.queue) > >>> 269 return -ENOMEM; <---Error here > >>> 270 > >>> > >>> ... > >>> > >>> 282 skb_queue_head_init(&qp->resp_pkts); <-this is not called. > >>> ... > >>> This will make spin_lock of resp_pkts is used before initialized. > >> IMHO, the above is same as > >> > >>> Which is caused by "skb_queue_head_init(&qp->resp_pkts)" is not called > >>> given rxe_qp_init_resp returns error, but the cleanup still trigger the > >>> chain. > >>> > >>> rxe_qp_do_cleanup -> rxe_completer -> drain_resp_pkts -> > >>> skb_dequeue(&qp->resp_pkts) > >> my previous analysis. If not, could you provide another better way to > >> fix it? > > Move the initialization to the beginning. This can fix this problem. > > See below: > > > > " > > diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c > > b/drivers/infiniband/sw/rxe/rxe_qp.c > > index c5451a4488ca..22ef6188d7b1 100644 > > --- a/drivers/infiniband/sw/rxe/rxe_qp.c > > +++ b/drivers/infiniband/sw/rxe/rxe_qp.c > > @@ -176,6 +176,9 @@ static void rxe_qp_init_misc(struct rxe_dev *rxe, > > struct rxe_qp *qp, > > spin_lock_init(&qp->rq.producer_lock); > > spin_lock_init(&qp->rq.consumer_lock); > > > > + skb_queue_head_init(&qp->req_pkts); > > + skb_queue_head_init(&qp->resp_pkts); > > + > > atomic_set(&qp->ssn, 0); > > atomic_set(&qp->skb_out, 0); > > } > > @@ -234,8 +237,6 @@ static int rxe_qp_init_req(struct rxe_dev *rxe, > > struct rxe_qp *qp, > > qp->req.opcode = -1; > > qp->comp.opcode = -1; > > > > - skb_queue_head_init(&qp->req_pkts); > > - > > rxe_init_task(&qp->req.task, qp, rxe_requester); > > rxe_init_task(&qp->comp.task, qp, rxe_completer); > > > > @@ -279,8 +280,6 @@ static int rxe_qp_init_resp(struct rxe_dev *rxe, > > struct rxe_qp *qp, > > } > > } > > > > - skb_queue_head_init(&qp->resp_pkts); > > - > > rxe_init_task(&qp->resp.task, qp, rxe_responder); > > > > qp->resp.opcode = OPCODE_NONE; > > " > > It is weird to me that init them in init_misc instead of init_req/resp, > given they > are dedicated/used for the different purpose. But just my 0.02$. There are some initialization problems in qp init. This needs refactoring the related functions to fix all the problems. Currently I am working on this. You know, this is not an easy task. It will take me a lot of time and effort. Now I use init_misc to initialize the related variables just as I did in the past. This is because init_misc is designed to initialize some variables. And it will not cause the similar use-before-initialization problems. Zhu Yanjun > > Guoqing