Hi, I have a report of a crash in fs/cachefiles that I'm looking into. The crash happened on a 3.0 based kernel (SLES11-SP3) but the code looks much the same in mainline. Analysis of the crash dump suggests that fscache_object_destroy, and thus __rb_erase_colour, is being called on an object that has already been destroy and is no longer in the rb tree. The rbtree code gets upset and crashes. Looking at the code I think it is easy to see what happened (well ... easy about the 3rd time I looked. Before then it was impossible). However I might have missed something and I don't know the code very well, hence this email to check my understanding and comment on a possible fix. The problem I see revolves around the refcounting in struct cachefiles_object. This has the fairly common property that when the count reaches zero, the object is destroyed. In such cases it is vitally important that it is never increased from zero to non-zero. There are two rules to achieving this. - The first is that normally we only take a reference when we already own a reference. - The second is that if we don't own a reference, then we must hold a lock that prevents the last reference being dropped while we take our reference. This last is often enabled using "atomic_dec_and_lock". cachefiles_objects live in an rbtree which does not imply a reference to the object. So just finding something in the rbtree does not mean there is a reference. So it is only safe to take a reference if you are holding an appropriate lock. cachefiles_mark_object_active() together with cachefiles_put_object() get this wrong. If cachefiles_mark_object_active() finds an object in the rbtree it will increment the refcount while still holding the ->active_lock, which is correct. However cachefiles_put_object() does not take the lock when the refcount might become zero. Thus you can get a race Thread 1 Thread 2 cachefiles_mark_object_active finds a conflicting object. cachefiles_put_object decrements ->usage to zero so decides to free it. cachefiles_mark_object_active increments ->usage (to 1) and drops the lock cachefiles_put_object calls fscache_object_destroy which unlinks from the rb tree. cachefiles_mark_object_active notices that CACHEFILES_OBJECT_ACTIVE is not set, so calls ->put_object which calls in to cachefiles_put_object and fscache_object_destroy and the object is removed from the rbtree again. A simple fix would be to replace the atomic_dec_and_test() in cachefiles_put_object with atomic_dec_and_lock(), locking cache->active_lock, and unlock() the ->active_lock after fscache_object_destroy() has been called. This looks like it is safe from a locking-hierarchy perspective, but I could be missing something. As ->active_lock is a rwlock rather than a spin_lock, we need to make an atomic_dec_and_write_lock(), but that is quite straight forward. So, I'm proposing the following. If you agree with my reasoning and my fix I'll submit a properly described patch, and maybe even move atomic_dec_and_write_lock() into a library somewhere. (note: patch compile tested but not run). Thanks, NeilBrown diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c index 57e17fe6121a..4e9c230f1be6 100644 --- a/fs/cachefiles/interface.c +++ b/fs/cachefiles/interface.c @@ -301,6 +301,22 @@ static void cachefiles_drop_object(struct fscache_object *_object) _leave(""); } +/* cribbed from lib/dec_and_lock.c */ +static int atomic_dec_and_write_lock(atomic_t *atomic, rwlock_t *lock) +{ + /* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */ + if (atomic_add_unless(atomic, -1, 1)) + return 0; + + /* Otherwise do it the slow way */ + write_lock(lock); + if (atomic_dec_and_test(atomic)) + return 1; + write_unlock(lock); + return 0; +} + + /* * dispose of a reference to an object */ @@ -308,6 +324,7 @@ static void cachefiles_put_object(struct fscache_object *_object) { struct cachefiles_object *object; struct fscache_cache *cache; + struct cachefiles_cache *cfcache; ASSERT(_object); @@ -323,7 +340,9 @@ static void cachefiles_put_object(struct fscache_object *_object) ASSERTIFCMP(object->fscache.parent, object->fscache.parent->n_children, >, 0); - if (atomic_dec_and_test(&object->usage)) { + cache = object->fscache.cache; + cfcache = container_of(cache, struct cachefiles_cache, cache); + if (atomic_dec_and_write_lock(&object->usage, &cfcache->active_lock)) { _debug("- kill object OBJ%x", object->fscache.debug_id); ASSERT(!test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags)); @@ -340,8 +359,8 @@ static void cachefiles_put_object(struct fscache_object *_object) object->lookup_data = NULL; } - cache = object->fscache.cache; fscache_object_destroy(&object->fscache); + write_unlock(&cfcache->active_lock); kmem_cache_free(cachefiles_object_jar, object); fscache_object_destroyed(cache); }
Attachment:
signature.asc
Description: PGP signature
-- Linux-cachefs mailing list Linux-cachefs@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cachefs