On 6 July 2018 at 01:45, NeilBrown <neilb@xxxxxxxx> wrote: > On Thu, Jul 05 2018, David Howells wrote: > >> From: kiran modukuri <kiran.modukuri@xxxxxxxxx> >> >> There is a potential race in fscache operation enqueuing for reading and >> copying multiple pages from cachefiles to netfs. >> Under some heavy load system, it will happen very often. >> >> If this race occurs, an oops similar to the following is seen: >> >> kernel BUG at fs/fscache/operation.c:69! >> invalid opcode: 0000 [#1] SMP >> ... >> #0 [ffff883fff0838d8] machine_kexec at ffffffff81051beb >> #1 [ffff883fff083938] crash_kexec at ffffffff810f2542 >> #2 [ffff883fff083a08] oops_end at ffffffff8163e1a8 >> #3 [ffff883fff083a30] die at ffffffff8101859b >> #4 [ffff883fff083a60] do_trap at ffffffff8163d860 >> #5 [ffff883fff083ab0] do_invalid_op at ffffffff81015204 >> #6 [ffff883fff083b60] invalid_op at ffffffff8164701e >> [exception RIP: fscache_enqueue_operation+246] >> RIP: ffffffffa0b793c6 RSP: ffff883fff083c18 RFLAGS: 00010046 >> RAX: 0000000000000019 RBX: ffff8832ed1a9ec0 RCX: 0000000000000006 >> RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046 >> RBP: ffff883fff083c20 R8: 0000000000000086 R9: 000000000000178f >> R10: ffffffff816aeb00 R11: ffff883fff08392e R12: ffff8802f0525620 >> R13: ffff88407ffc01d8 R14: 0000000000000000 R15: 0000000000000003 >> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 >> #7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6 >> #8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48 >> #9 [ffff883fff083c48] __wake_up_common at ffffffff810af028 >> >> Reported-by: Lei Xue <carmark.dlut@xxxxxxxxx> >> Reported-by: Vegard Nossum <vegard.nossum@xxxxxxxxx> >> Reported-by: Anthony DeRobertis <aderobertis@xxxxxxxxxxx> >> Reported-by: NeilBrown <neilb@xxxxxxxx> >> Reported-by: Daniel Axtens <dja@xxxxxxxxxx> >> Reported-by: KiranKumar Modukuri <kiran.modukuri@xxxxxxxxx> >> Signed-off-by: David Howells <dhowells@xxxxxxxxxx> >> --- [...] > Thanks - I like this approach. Taking the extra reference makes it a > lot more clear what is happening and why. The changelog is a bit sparse, no? We have more info here: https://lkml.org/lkml/2018/5/8/520 https://lkml.org/lkml/2018/7/3/1184 Why not crib some of that and explain the issue properly (or at minimum link the previous threads)? Thanks, Vegard -- Linux-cachefs mailing list Linux-cachefs@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cachefs