Re: [PATCH v18 67/71] libceph: defer removing the req from osdc just after req->r_callback

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 12, 2023 at 1:15 PM <xiubli@xxxxxxxxxx> wrote:
>
> From: Xiubo Li <xiubli@xxxxxxxxxx>
>
> The sync_filesystem() will flush all the dirty buffer and submit the
> osd reqs to the osdc and then is blocked to wait for all the reqs to
> finish. But the when the reqs' replies come, the reqs will be removed
> from osdc just before the req->r_callback()s are called. Which means
> the sync_filesystem() will be woke up by leaving the req->r_callback()s
> are still running.
>
> This will be buggy when the waiter require the req->r_callback()s to
> release some resources before continuing. So we need to make sure the
> req->r_callback()s are called before removing the reqs from the osdc.
>
> WARNING: CPU: 4 PID: 168846 at fs/crypto/keyring.c:242 fscrypt_destroy_keyring+0x7e/0xd0
> CPU: 4 PID: 168846 Comm: umount Tainted: G S  6.1.0-rc5-ceph-g72ead199864c #1
> Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
> RIP: 0010:fscrypt_destroy_keyring+0x7e/0xd0
> RSP: 0018:ffffc9000b277e28 EFLAGS: 00010202
> RAX: 0000000000000002 RBX: ffff88810d52ac00 RCX: ffff88810b56aa00
> RDX: 0000000080000000 RSI: ffffffff822f3a09 RDI: ffff888108f59000
> RBP: ffff8881d394fb88 R08: 0000000000000028 R09: 0000000000000000
> R10: 0000000000000001 R11: 11ff4fe6834fcd91 R12: ffff8881d394fc40
> R13: ffff888108f59000 R14: ffff8881d394f800 R15: 0000000000000000
> FS:  00007fd83f6f1080(0000) GS:ffff88885fd00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f918d417000 CR3: 000000017f89a005 CR4: 00000000003706e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> generic_shutdown_super+0x47/0x120
> kill_anon_super+0x14/0x30
> ceph_kill_sb+0x36/0x90 [ceph]
> deactivate_locked_super+0x29/0x60
> cleanup_mnt+0xb8/0x140
> task_work_run+0x67/0xb0
> exit_to_user_mode_prepare+0x23d/0x240
> syscall_exit_to_user_mode+0x25/0x60
> do_syscall_64+0x40/0x80
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7fd83dc39e9b
>
> URL: https://tracker.ceph.com/issues/58126
> Tested-by: Luís Henriques <lhenriques@xxxxxxx>
> Tested-by: Venky Shankar <vshankar@xxxxxxxxxx>
> Reviewed-by: Luís Henriques <lhenriques@xxxxxxx>
> Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
> ---
>  net/ceph/osd_client.c | 43 +++++++++++++++++++++++++++++++++++--------
>  1 file changed, 35 insertions(+), 8 deletions(-)
>
> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index 78b622178a3d..db3d93d3e692 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -2507,7 +2507,7 @@ static void submit_request(struct ceph_osd_request *req, bool wrlocked)
>         __submit_request(req, wrlocked);
>  }
>
> -static void finish_request(struct ceph_osd_request *req)
> +static void __finish_request(struct ceph_osd_request *req)
>  {
>         struct ceph_osd_client *osdc = req->r_osdc;
>
> @@ -2516,12 +2516,6 @@ static void finish_request(struct ceph_osd_request *req)
>
>         req->r_end_latency = ktime_get();
>
> -       if (req->r_osd) {
> -               ceph_init_sparse_read(&req->r_osd->o_sparse_read);
> -               unlink_request(req->r_osd, req);
> -       }
> -       atomic_dec(&osdc->num_requests);
> -
>         /*
>          * If an OSD has failed or returned and a request has been sent
>          * twice, it's possible to get a reply and end up here while the
> @@ -2532,13 +2526,46 @@ static void finish_request(struct ceph_osd_request *req)
>         ceph_msg_revoke_incoming(req->r_reply);
>  }
>
> +static void __remove_request(struct ceph_osd_request *req)
> +{
> +       struct ceph_osd_client *osdc = req->r_osdc;
> +
> +       dout("%s req %p tid %llu\n", __func__, req, req->r_tid);
> +
> +       if (req->r_osd) {
> +               ceph_init_sparse_read(&req->r_osd->o_sparse_read);
> +               unlink_request(req->r_osd, req);
> +       }
> +       atomic_dec(&osdc->num_requests);
> +}
> +
> +static void finish_request(struct ceph_osd_request *req)
> +{
> +       __finish_request(req);
> +       __remove_request(req);
> +}
> +
>  static void __complete_request(struct ceph_osd_request *req)
>  {
> +       struct ceph_osd_client *osdc = req->r_osdc;
> +       struct ceph_osd *osd = req->r_osd;
> +
>         dout("%s req %p tid %llu cb %ps result %d\n", __func__, req,
>              req->r_tid, req->r_callback, req->r_result);
>
>         if (req->r_callback)
>                 req->r_callback(req);
> +
> +       down_read(&osdc->lock);
> +       if (osd) {
> +               mutex_lock(&osd->lock);
> +               __remove_request(req);
> +               mutex_unlock(&osd->lock);
> +       } else {
> +               atomic_dec(&osdc->num_requests);
> +       }
> +       up_read(&osdc->lock);

Hi Xiubo,

I think it was highlighted before that this patch is wrong so I'm
surprised to see it in this series, broken out and with a different
subject.

On top of changing long-standing behavior which is purposefully
consistent with Objecter behavior in userspace, it also seems to
introduce a race condition with ceph_osdc_cancel_request() which can
lead, among other things, to a use-after-free on req.  Consider what
would happen if the request is cancelled while the callback is running:
since it would still be linked at that point, cancel_request() would be
allowed to go through and eventually put the messenger's reference.

Thanks,

                Ilya




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux