This is a note to let you know that I've just added the patch titled libceph: fix potential hang in ceph_osdc_notify() to the 4.19-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: libceph-fix-potential-hang-in-ceph_osdc_notify.patch and it can be found in the queue-4.19 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From e6e2843230799230fc5deb8279728a7218b0d63c Mon Sep 17 00:00:00 2001 From: Ilya Dryomov <idryomov@xxxxxxxxx> Date: Tue, 1 Aug 2023 19:14:24 +0200 Subject: libceph: fix potential hang in ceph_osdc_notify() From: Ilya Dryomov <idryomov@xxxxxxxxx> commit e6e2843230799230fc5deb8279728a7218b0d63c upstream. If the cluster becomes unavailable, ceph_osdc_notify() may hang even with osd_request_timeout option set because linger_notify_finish_wait() waits for MWatchNotify NOTIFY_COMPLETE message with no associated OSD request in flight -- it's completely asynchronous. Introduce an additional timeout, derived from the specified notify timeout. While at it, switch both waits to killable which is more correct. Cc: stable@xxxxxxxxxxxxxxx Signed-off-by: Ilya Dryomov <idryomov@xxxxxxxxx> Reviewed-by: Dongsheng Yang <dongsheng.yang@xxxxxxxxxxxx> Reviewed-by: Xiubo Li <xiubli@xxxxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- net/ceph/osd_client.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -3137,17 +3137,24 @@ static int linger_reg_commit_wait(struct int ret; dout("%s lreq %p linger_id %llu\n", __func__, lreq, lreq->linger_id); - ret = wait_for_completion_interruptible(&lreq->reg_commit_wait); + ret = wait_for_completion_killable(&lreq->reg_commit_wait); return ret ?: lreq->reg_commit_error; } -static int linger_notify_finish_wait(struct ceph_osd_linger_request *lreq) +static int linger_notify_finish_wait(struct ceph_osd_linger_request *lreq, + unsigned long timeout) { - int ret; + long left; dout("%s lreq %p linger_id %llu\n", __func__, lreq, lreq->linger_id); - ret = wait_for_completion_interruptible(&lreq->notify_finish_wait); - return ret ?: lreq->notify_finish_error; + left = wait_for_completion_killable_timeout(&lreq->notify_finish_wait, + ceph_timeout_jiffies(timeout)); + if (left <= 0) + left = left ?: -ETIMEDOUT; + else + left = lreq->notify_finish_error; /* completed */ + + return left; } /* @@ -4760,7 +4767,8 @@ int ceph_osdc_notify(struct ceph_osd_cli ret = linger_reg_commit_wait(lreq); if (!ret) - ret = linger_notify_finish_wait(lreq); + ret = linger_notify_finish_wait(lreq, + msecs_to_jiffies(2 * timeout * MSEC_PER_SEC)); else dout("lreq %p failed to initiate notify %d\n", lreq, ret); Patches currently in stable-queue which might be from idryomov@xxxxxxxxx are queue-4.19/libceph-fix-potential-hang-in-ceph_osdc_notify.patch queue-4.19/ceph-don-t-let-check_caps-skip-sending-responses-for-revoke-msgs.patch