On Tue, May 23, 2023 at 1:44 PM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote: > > > > On 5/23/23 13:18, Zhu Yanjun wrote: > > On Tue, May 23, 2023 at 1:08 PM Zhu Yanjun <zyjzyj2000@xxxxxxxxx> wrote: > >> On Tue, May 23, 2023 at 12:29 PM Zhu Yanjun <zyjzyj2000@xxxxxxxxx> wrote: > >>> On Tue, May 23, 2023 at 12:10 PM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote: > >>>> > >>>> > >>>> On 5/23/23 12:02, Zhu Yanjun wrote: > >>>>> On Tue, May 23, 2023 at 11:47 AM Zhu Yanjun <zyjzyj2000@xxxxxxxxx> wrote: > >>>>>> On Tue, May 23, 2023 at 10:26 AM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote: > >>>>>>> > >>>>>>> On 5/23/23 10:13, syzbot wrote: > >>>>>>>> Hello, > >>>>>>>> > >>>>>>>> syzbot tried to test the proposed patch but the build/boot failed: > >>>>>>>> > >>>>>>>> failed to apply patch: > >>>>>>>> checking file drivers/infiniband/sw/rxe/rxe_qp.c > >>>>>>>> patch: **** unexpected end of file in patch > >>>>>> This is not the root cause. The fix is not good. > >>>>> This problem is about "INFO: trying to register non-static key. The > >>>>> code is fine but needs lockdep annotation, or maybe" > >>> This warning is from "lock is not initialized". This is a > >>> use-before-initialized problem. > >>> The correct fix is to initialize the lock that is complained before it is used. > >>> > >>> Zhu Yanjun > >> Based on the call trace, the followings are the order of this call trace. > >> > >> 291 /* called by the create qp verb */ > >> 292 int rxe_qp_from_init(struct rxe_dev *rxe, struct rxe_qp *qp, > >> struct rxe_pd *pd, > >> 297 { > >> ... > >> 317 rxe_qp_init_misc(rxe, qp, init); > >> ... > >> 322 > >> 323 err = rxe_qp_init_resp(rxe, qp, init, udata, uresp); > >> 324 if (err) > >> 325 goto err2; <--- error > >> > >> ... > >> > >> 334 err2: > >> 335 rxe_queue_cleanup(qp->sq.queue); <--- Goto here > >> 336 qp->sq.queue = NULL; > >> > >> In rxe_qp_init_resp, the error occurs before skb_queue_head_init. > >> So this call trace appeared. > > 250 static int rxe_qp_init_resp(struct rxe_dev *rxe, struct rxe_qp *qp, > > 254 { > > ... > > 264 > > 265 type = QUEUE_TYPE_FROM_CLIENT; > > 266 qp->rq.queue = rxe_queue_init(rxe, &qp->rq.max_wr, > > 267 wqe_size, type); > > 268 if (!qp->rq.queue) > > 269 return -ENOMEM; <---Error here > > 270 > > > > ... > > > > 282 skb_queue_head_init(&qp->resp_pkts); <-this is not called. > > ... > > This will make spin_lock of resp_pkts is used before initialized. > > IMHO, the above is same as > > > Which is caused by "skb_queue_head_init(&qp->resp_pkts)" is not called > > given rxe_qp_init_resp returns error, but the cleanup still trigger the > > chain. > > > > rxe_qp_do_cleanup -> rxe_completer -> drain_resp_pkts -> > > skb_dequeue(&qp->resp_pkts) > > my previous analysis. If not, could you provide another better way to > fix it? Move the initialization to the beginning. This can fix this problem. See below: " diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c index c5451a4488ca..22ef6188d7b1 100644 --- a/drivers/infiniband/sw/rxe/rxe_qp.c +++ b/drivers/infiniband/sw/rxe/rxe_qp.c @@ -176,6 +176,9 @@ static void rxe_qp_init_misc(struct rxe_dev *rxe, struct rxe_qp *qp, spin_lock_init(&qp->rq.producer_lock); spin_lock_init(&qp->rq.consumer_lock); + skb_queue_head_init(&qp->req_pkts); + skb_queue_head_init(&qp->resp_pkts); + atomic_set(&qp->ssn, 0); atomic_set(&qp->skb_out, 0); } @@ -234,8 +237,6 @@ static int rxe_qp_init_req(struct rxe_dev *rxe, struct rxe_qp *qp, qp->req.opcode = -1; qp->comp.opcode = -1; - skb_queue_head_init(&qp->req_pkts); - rxe_init_task(&qp->req.task, qp, rxe_requester); rxe_init_task(&qp->comp.task, qp, rxe_completer); @@ -279,8 +280,6 @@ static int rxe_qp_init_resp(struct rxe_dev *rxe, struct rxe_qp *qp, } } - skb_queue_head_init(&qp->resp_pkts); - rxe_init_task(&qp->resp.task, qp, rxe_responder); qp->resp.opcode = OPCODE_NONE; " > > Guoqing