BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776254 The patch is based on the ubuntu linux kernel 4.4.0-124-generic [Impact] Oops during heavy NFS + FSCache + Cachefiles CacheFiles: Error: Overlong wait for old active object to go away. BUG: unable to handle kernel NULL pointer dereference at 0000000000000002 CacheFiles: Error: Object already active kernel BUG at fs/cachefiles/namei.c:163! [Cause] In a heavily loaded system with big files being read and truncated, an fscache object for a cookie is being dropped and a new object being looked. The new object being looked for has to wait for the old object to go away before the new object is moved to active state. [Fix] Clear the flag 'CACHEFILES_OBJECT_ACTIVE' for the new object when retrying the object lookup. Remove the BUG() for the case where the old object is still being dropped and convert to WARN() [Testcase] Have run ~100 hours of NFS stress tests and have not seen this bug recur. [Regression Potential] - Limited to fscache/cachefiles. Signed-off-by: kmodukuri <mailto:kiran.modukuri@xxxxxxxxx> --- fs/cachefiles/namei.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index c4b8934..b08286d 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -194,7 +194,7 @@ wait_for_old_object: pr_err("\n"); pr_err("Error: Unexpected object collision\n"); cachefiles_printk_object(object, xobject); - BUG(); + WARN(true, "Unexpected object collision\n"); } atomic_inc(&xobject->usage); write_unlock(&cache->active_lock); @@ -247,6 +247,7 @@ wait_for_old_object: ASSERT(!test_bit(CACHEFILES_OBJECT_ACTIVE, &xobject->flags)); + clear_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags); cache->cache.ops->put_object(&xobject->fscache); goto try_again; -- 2.7.4 -- Linux-cachefs mailing list Linux-cachefs@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cachefs