Re: [PATCH v3] ceph: fix potential use-after-free bug when trimming caps

Luís Henriques <lhenriques@xxxxxxx> · Mon, 01 May 2023 11:37:28 +0100

Ilya Dryomov <idryomov@xxxxxxxxx> writes:

> On Wed, Apr 20, 2023 at 3:22 PM Luís Henriques <lhenriques@xxxxxxx> wrote:
>>
>> Xiubo Li <xiubli@xxxxxxxxxx> writes:
>>
>> > On 4/18/23 22:20, Luís Henriques wrote:
>> >> xiubli@xxxxxxxxxx writes:
>> >>
>> >>> From: Xiubo Li <xiubli@xxxxxxxxxx>
>> >>>
>> >>> When trimming the caps and just after the 'session->s_cap_lock' is
>> >>> released in ceph_iterate_session_caps() the cap maybe removed by
>> >>> another thread, and when using the stale cap memory in the callbacks
>> >>> it will trigger use-after-free crash.
>> >>>
>> >>> We need to check the existence of the cap just after the 'ci->i_ceph_lock'
>> >>> being acquired. And do nothing if it's already removed.
>> >> Your patch seems to be OK, but I'll be honest: the locking is *so* complex
>> >> that I can say for sure it really solves any problem :-(
>> >>
>> >> ceph_put_cap() uses mdsc->caps_list_lock to protect the list, but I can't
>> >> be sure that holding ci->i_ceph_lock will protect a race in the case
>> >> you're trying to solve.
>> >
>> > The 'mdsc->caps_list_lock' will protect the members in mdsc:
>> >
>> >         /*
>> >          * Cap reservations
>> >          *
>> >          * Maintain a global pool of preallocated struct ceph_caps, referenced
>> >          * by struct ceph_caps_reservations.  This ensures that we preallocate
>> >          * memory needed to successfully process an MDS response. (If an MDS
>> >          * sends us cap information and we fail to process it, we will have
>> >          * problems due to the client and MDS being out of sync.)
>> >          *
>> >          * Reservations are 'owned' by a ceph_cap_reservation context.
>> >          */
>> >         spinlock_t      caps_list_lock;
>> >         struct          list_head caps_list; /* unused (reserved or
>> >                                                 unreserved) */
>> >         struct          list_head cap_wait_list;
>> >         int             caps_total_count;    /* total caps allocated */
>> >         int             caps_use_count;      /* in use */
>> >         int             caps_use_max;        /* max used caps */
>> >         int             caps_reserve_count;  /* unused, reserved */
>> >         int             caps_avail_count;    /* unused, unreserved */
>> >         int             caps_min_count;      /* keep at least this many
>> >
>> > Not protecting the cap list in session or inode.
>> >
>> >
>> > And the racy is between the session's cap list and inode's cap rbtree and both
>> > are holding the same 'cap' reference.
>> >
>> > So in 'ceph_iterate_session_caps()' after getting the 'cap' and releasing the
>> > 'session->s_cap_lock', just before passing the 'cap' to _cb() another thread
>> > could continue and release the 'cap'. Then the 'cap' should be stale now and
>> > after being passed to _cb() the 'cap' when dereferencing it will crash the
>> > kernel.
>> >
>> > And if the 'cap' is stale, it shouldn't exist in the inode's cap rbtree. Please
>> > note the lock order will be:
>> >
>> > 1, spin_lock(&ci->i_ceph_lock)
>> >
>> > 2, spin_lock(&session->s_cap_lock)
>> >
>> >
>> > Before:
>> >
>> > ThreadA: ThreadB:
>> >
>> > __ceph_remove_caps() -->
>> >
>> >     spin_lock(&ci->i_ceph_lock)
>> >
>> >     ceph_remove_cap() --> ceph_iterate_session_caps() -->
>> >
>> >         __ceph_remove_cap() --> spin_lock(&session->s_cap_lock);
>> >
>> > cap = list_entry(p, struct ceph_cap, session_caps);
>> >
>> > spin_unlock(&session->s_cap_lock);
>> >
>> >             spin_lock(&session->s_cap_lock);
>> >
>> >             // remove it from the session's cap list
>> >
>> >             list_del_init(&cap->session_caps);
>> >
>> >             spin_unlock(&session->s_cap_lock);
>> >
>> >             ceph_put_cap()
>> >
>> > trim_caps_cb('cap') -->   // the _cb() could be deferred after ThreadA finished
>> > 'ceph_put_cap()'
>> >
>> > spin_unlock(&ci->i_ceph_lock) dreference cap->xxx will trigger crash
>> >
>> >
>> >
>> > With this patch:
>> >
>> > ThreadA: ThreadB:
>> >
>> > __ceph_remove_caps() -->
>> >
>> >     spin_lock(&ci->i_ceph_lock)
>> >
>> >     ceph_remove_cap() --> ceph_iterate_session_caps() -->
>> >
>> >         __ceph_remove_cap() --> spin_lock(&session->s_cap_lock);
>> >
>> > cap = list_entry(p, struct ceph_cap, session_caps);
>> >
>> > ci_node = &cap->ci_node;
>> >
>> > spin_unlock(&session->s_cap_lock);
>> >
>> >             spin_lock(&session->s_cap_lock);
>> >
>> >             // remove it from the session's cap list
>> >
>> >             list_del_init(&cap->session_caps);
>> >
>> >             spin_unlock(&session->s_cap_lock);
>> >
>> >             ceph_put_cap()
>> >
>> > trim_caps_cb('ci_node') -->
>> >
>> > spin_unlock(&ci->i_ceph_lock)
>> >
>> > spin_lock(&ci->i_ceph_lock)
>> >
>> > cap = rb_entry(ci_node, struct ceph_cap, ci_node);    // This is buggy in this
>> > version, we should use the 'mds' instead and I will fix it.
>> >
>> > if (!cap)  { release the spin lock and return directly }
>> >
>> > spin_unlock(&ci->i_ceph_lock)
>>
>> Thanks a lot for taking the time to explain all of this.  Much
>> appreciated.  It all seems to make sense, and, again, I don't have any
>> real objection to your patch.  It's just that I still find the whole
>> locking to be too complex, and every change that is made to it looks like
>> walking on a mine field :-)
>>
>> > While we should switch to use the 'mds' of the cap instead of the 'ci_node',
>> > which is buggy. I will fix it in next version.
>>
>> Yeah, I've took a quick look at v4 and it looks like it fixes this.
>
> Hi Luís,
>
> Do you mind if I put this down as a Reviewed-by? ;)

Sure, feel free to add my

Reviewed-by: Luís Henriques <lhenriques@xxxxxxx>

(Sorry, forgot to send that explicitly.)

Cheers,
-- 
Luís