This is a note to let you know that I've just added the patch titled libceph: fix potential hang in ceph_osdc_notify() to the 5.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: libceph-fix-potential-hang-in-ceph_osdc_notify.patch and it can be found in the queue-5.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From e6e2843230799230fc5deb8279728a7218b0d63c Mon Sep 17 00:00:00 2001 From: Ilya Dryomov <idryomov@xxxxxxxxx> Date: Tue, 1 Aug 2023 19:14:24 +0200 Subject: libceph: fix potential hang in ceph_osdc_notify() From: Ilya Dryomov <idryomov@xxxxxxxxx> commit e6e2843230799230fc5deb8279728a7218b0d63c upstream. If the cluster becomes unavailable, ceph_osdc_notify() may hang even with osd_request_timeout option set because linger_notify_finish_wait() waits for MWatchNotify NOTIFY_COMPLETE message with no associated OSD request in flight -- it's completely asynchronous. Introduce an additional timeout, derived from the specified notify timeout. While at it, switch both waits to killable which is more correct. Cc: stable@xxxxxxxxxxxxxxx Signed-off-by: Ilya Dryomov <idryomov@xxxxxxxxx> Reviewed-by: Dongsheng Yang <dongsheng.yang@xxxxxxxxxxxx> Reviewed-by: Xiubo Li <xiubli@xxxxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- net/ceph/osd_client.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -3330,17 +3330,24 @@ static int linger_reg_commit_wait(struct int ret; dout("%s lreq %p linger_id %llu\n", __func__, lreq, lreq->linger_id); - ret = wait_for_completion_interruptible(&lreq->reg_commit_wait); + ret = wait_for_completion_killable(&lreq->reg_commit_wait); return ret ?: lreq->reg_commit_error; } -static int linger_notify_finish_wait(struct ceph_osd_linger_request *lreq) +static int linger_notify_finish_wait(struct ceph_osd_linger_request *lreq, + unsigned long timeout) { - int ret; + long left; dout("%s lreq %p linger_id %llu\n", __func__, lreq, lreq->linger_id); - ret = wait_for_completion_interruptible(&lreq->notify_finish_wait); - return ret ?: lreq->notify_finish_error; + left = wait_for_completion_killable_timeout(&lreq->notify_finish_wait, + ceph_timeout_jiffies(timeout)); + if (left <= 0) + left = left ?: -ETIMEDOUT; + else + left = lreq->notify_finish_error; /* completed */ + + return left; } /* @@ -4890,7 +4897,8 @@ int ceph_osdc_notify(struct ceph_osd_cli linger_submit(lreq); ret = linger_reg_commit_wait(lreq); if (!ret) - ret = linger_notify_finish_wait(lreq); + ret = linger_notify_finish_wait(lreq, + msecs_to_jiffies(2 * timeout * MSEC_PER_SEC)); else dout("lreq %p failed to initiate notify %d\n", lreq, ret); Patches currently in stable-queue which might be from idryomov@xxxxxxxxx are queue-5.15/rbd-prevent-busy-loop-when-requesting-exclusive-lock.patch queue-5.15/ceph-defer-stopping-mdsc-delayed_work.patch queue-5.15/libceph-fix-potential-hang-in-ceph_osdc_notify.patch