Re: locking/refcount problems in cachefiles.

David Howells <dhowells@xxxxxxxxxx> · Wed, 29 Jan 2014 12:17:15 +0000

NeilBrown <neilb@xxxxxxx> wrote:

>  Analysis of the crash dump suggests that fscache_object_destroy, and thus
>  __rb_erase_colour, is being called on an object that has already been
>  destroy and is no longer in the rb tree.  The rbtree code gets upset and
>  crashes.

Not unreasonably...  But which rb_tree?  There are two:

 (1) struct cachefiles_cache::active_nodes.

     This is governed by struct cachefiles_cache::active_lock.

 (2) fscache_object_list.

     This is governed by fscache_object_list_lock.

     Unless you have CONFIG_FSCACHE_OBJECT_LIST=y this isn't present and
     fscache_objlist_remove() does nothing - in which case all
     fscache_object_destroy() does is release the cookie.

Can you poke around in the registers, see if any of them point to tree (2)
(which is a global variable).

>   Thus you can get a race
> ...
>                                     cachefiles_mark_object_active increments
>                                        ->usage (to 1) and drops the lock

This is tree (1).

>      cachefiles_put_object calls
>         fscache_object_destroy which
>         unlinks from the rb tree.

And this is tree (2).

>   cachefiles_objects live in an rbtree which does not imply a reference to
>   the object.

Whilst that is true, they're not allowed to be in the rbtree unless they still
have at least one reference outstanding.

Apart from cachefiles_walk_to_object()'s "check_error" labelled part, objects
are only rb_erase()'d in cachefiles_drop_object().  This is called from the
fscache object state machine (fscache_drop_object) which holds a ref on the
cachefiles object until fscache_object_work_func() releases it just prior to
returning.

David

--
Linux-cachefs mailing list
Linux-cachefs@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cachefs