On Tue, Jan 4, 2011 at 10:00 AM, Mark Moseley <moseleymark@xxxxxxxxx> wrote: > Latest testing on a 2.6.36.2 kernel, 64-bit, Debian Lenny. Ran under > heavy load for a few hours before this: > > [38607.552070] IP: [<ffffffff811a4187>] > __fscache_read_or_alloc_pages+0x157/0x360 > [38607.567128] last sysfs file: > /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map > [38607.604773] Pid: 30970, comm: httpd Not tainted 2.6.36.2 #1 > 0XDN97/PowerEdge R610 > [38607.612286] RIP: 0010:[<ffffffff811a4187>] [<ffffffff811a4187>] > __fscache_read_or_alloc_pages+0x157/0x360 > [38607.734181] [<ffffffff81248d4b>] __nfs_readpages_from_fscache+0x6b/0x1d0 > [38607.740991] [<ffffffff8122a674>] nfs_readpages+0xd4/0x200 Just to update: I was testing on 2.6.37 and got the exact same BUG+traceback. It was on a 30gb partition on an SSD, 2.6.37 64-bit, Debian Lenny. It ran about 24 hrs (and was up to the cachefilesd.conf limits after a few hours) before hitting this. It would do the same as 2.6.36.2 and every handful of seconds print out "CacheFiles: Error: Overlong wait for old active object to go away". I'm no good at reading assembly but it's around here in fs/fscache/page.c in __fscache_read_or_alloc_pages: /* we wait for the operation to become active, and then process it * *here*, in this thread, and not in the thread pool */ ret = fscache_wait_for_retrieval_activation( object, op, __fscache_stat(&fscache_n_retrieval_op_waits), __fscache_stat(&fscache_n_retrievals_object_dead)); if (ret < 0) goto error; /* ask the cache to honour the operation */ if (test_bit(FSCACHE_COOKIE_NO_DATA_YET, &object->cookie->flags)) { fscache_stat(&fscache_n_cop_allocate_pages); ret = object->cache->ops->allocate_pages( op, pages, nr_pages, gfp); fscache_stat_d(&fscache_n_cop_allocate_pages); } else { fscache_stat(&fscache_n_cop_read_or_alloc_pages); ret = object->cache->ops->read_or_alloc_pages( op, pages, nr_pages, gfp); fscache_stat_d(&fscache_n_cop_read_or_alloc_pages); } Here's the surrounding lines around 0x157 in objdump: 0000000000001015 <__fscache_read_or_alloc_pages+0x145> mov %r14,%rdi 0000000000001018 <__fscache_read_or_alloc_pages+0x148> callq 0000000000000bb0 <fscache_wait_for_retrieval_activation> 000000000000101d <__fscache_read_or_alloc_pages+0x14d> mov %eax,%ebx 000000000000101f <__fscache_read_or_alloc_pages+0x14f> test %eax,%eax 0000000000001021 <__fscache_read_or_alloc_pages+0x151> js 0000000000001073 <__fscache_read_or_alloc_pages+0x1a3> 0000000000001023 <__fscache_read_or_alloc_pages+0x153> mov 0x70(%r14),%rax 0000000000001027 <__fscache_read_or_alloc_pages+0x157> mov 0x40(%rax),%rax 000000000000102b <__fscache_read_or_alloc_pages+0x15b> test $0x4,%al 000000000000102d <__fscache_read_or_alloc_pages+0x15d> je 0000000000001128 <__fscache_read_or_alloc_pages+0x258> 0000000000001033 <__fscache_read_or_alloc_pages+0x163> lock incl 0x0(%rip) # 000000000000103a <__fscache_read_or_alloc_pages+0x16a> 000000000000103a <__fscache_read_or_alloc_pages+0x16a> jno 0000000000001045 <__fscache_read_or_alloc_pages+0x175> 000000000000103c <__fscache_read_or_alloc_pages+0x16c> lock decl 0x0(%rip) # 0000000000001043 <__fscache_read_or_alloc_pages+0x173> -- Linux-cachefs mailing list Linux-cachefs@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cachefs