hi leon I can not get what you means, do you say the rxe_add_ref(qp) is not needed? My kernel is old, and I found some bugs of rxe on 4.14.97, especially the rnr errors. I can not upgrade whole kernel because there are many dependencies. Finally , I sync the fixed from newest kernel version to the 4.14.97. When I compare my rxe_resp.c with kernel 5.2.9 , I found the snippet of duplicate_request is changed. and rxe_xmit_packet will call rxe_send,enter the log "rdma_rxe: Unknown layer 3 protocol: 0" 1137 } else { 1138 struct resp_res *res; 1139 1140 /* Find the operation in our list of responder resources. */ 1141 res = find_resource(qp, pkt->psn); 1142 if (res) { 1143 struct sk_buff *skb_copy; 1144 1145 skb_copy = skb_clone(res->atomic.skb, GFP_ATOMIC); 1146 if (skb_copy) { 1147 rxe_add_ref(qp); /* for the new SKB */ 1148 } else { 1149 pr_warn("Couldn't clone atomic resp\n"); 1150 rc = RESPST_CLEANUP; 1151 goto out; 1152 } 1153 1154 /* Resend the result. */ 1155 rc = rxe_xmit_packet(to_rdev(qp->ibqp.device), qp, 1156 pkt, skb_copy); 1157 if (rc) { 1158 pr_err("Failed resending result. This flow is not handled - skb ignored\n"); 1159 rxe_drop_ref(qp); 1160 rc = RESPST_CLEANUP; 1161 goto out; 1162 } 1163 } 1164 1165 /* Resource not found. Class D error. Drop the request. */ 1166 rc = RESPST_CLEANUP; 1167 goto out; 1168 } 1169 out: 1170 return rc; 1171 } On Wed, Dec 25, 2019 at 2:33 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote: > > On Wed, Dec 25, 2019 at 12:55:35PM +0800, Frank Huang wrote: > > hi, there is a panic on rdma_rxe module when the restart > > network.service or shutdown the switch. > > > > it looks like a use-after-free error. > > > > everytime it happens, there is the log "rdma_rxe: Unknown layer 3 protocol: 0" > > The error print itself is harmless. > > > > is it a known error? > > > > my kernel version is 4.14.97 > > Your kernel is old enough and doesn't include refcount, > so I can't say for sure that it is the case, but the > following code is not correct and with refcount debug > it will be seen immediately. > > 1213 int rxe_responder(void *arg) > 1214 { > 1215 struct rxe_qp *qp = (struct rxe_qp *)arg; > 1216 struct rxe_dev *rxe = to_rdev(qp->ibqp.device); > 1217 enum resp_states state; > 1218 struct rxe_pkt_info *pkt = NULL; > 1219 int ret = 0; > 1220 > 1221 rxe_add_ref(qp); <------ USE-AFTER-FREE > 1222 > 1223 qp->resp.aeth_syndrome = AETH_ACK_UNLIMITED; > 1224 > 1225 if (!qp->valid) { > 1226 ret = -EINVAL; > 1227 goto done; > 1228 } > > Thanks