> The patch below (also pushed to ceph-client.git master) should fix this. > Can you give it a test? > The exception still occurred with this patch. From the log below, the case seems to be: =================== kernel: libceph: osd0 192.168.101.134:6800 socket closed kernel: libceph: fault ffff880002284830 state 69 to peer 192.168.101.134:6800 kernel: libceph: fault on LOSSYTX channel kernel: libceph: osd_reset osd0 kernel: libceph: __kick_osd_requests osd0 kernel: libceph: __reset_osd ffff880002284800 osd0 kernel: libceph: con_close ffff880002284830 peer 192.168.101.134:6800 kernel: libceph: get_osd ffff880002284800 2 -> 3 kernel: libceph: queue_con ffff880002284830 - already BUSY kernel: libceph: put_osd ffff880002284800 3 -> 2 kernel: libceph: con_open ffff880002284830 192.168.101.134:6800 kernel: libceph: get_osd ffff880002284800 2 -> 3 kernel: libceph: queue_con ffff880002284830 - already BUSY kernel: libceph: put_osd ffff880002284800 3 -> 2 kernel: libceph: __unregister_linger_request ffff880002511e00 kernel: libceph: moving osd to ffff880002284800 lru kernel: libceph: __move_osd_to_lru ffff880002284800 =================== The linger request should had succeeded, so it was removed from osd0's o_requests list and put on o_linger_requests during handle_reply(). Since it is not on o_requests any more, the req->r_osd is set to NULL even with the patch. =================== kernel: libceph: register_request ffff880002511e00 tid 178 kernel: libceph: first request, scheduling timeout kernel: libceph: requeued lingering ffff880002511e00 tid 178 osd0 kernel: libceph: send_queued kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 kernel: IP: [<ffffffffa01494ac>] __send_request+0x27/0xd5 [libceph] =================== I wonder if we should not requeue the succeeded linger request in __kick_osd_requests as below. @@ -576,15 +576,6 @@ static void __kick_osd_requests(struct ceph_osd_client *osdc, if (!req->r_linger) req->r_flags |= CEPH_OSD_FLAG_RETRY; } - - list_for_each_entry_safe(req, nreq, &osd->o_linger_requests, - r_linger_osd) { - __unregister_linger_request(osdc, req); - __register_request(osdc, req); - list_move(&req->r_req_lru_item, &osdc->req_unsent); - dout("requeued lingering %p tid %llu osd%d\n", req, req->r_tid, - osd->o_osd); - } } -- Henry -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html