Elbandi, Thanks to your stack trace I see the bug. I'll send you a fix as soon as I get back to my office. Apparently, I spent too much time testing it in UP vms and UML. Thanks, -- Milosz On Wed, May 29, 2013 at 5:47 AM, Elso Andras <elso.andras@xxxxxxxxx> wrote: > Hi, > > I try your fscache patch on my test cluster. the client node is a > ubuntu lucid (10.4) with 3.8 kernel (*) + your patch. > Little after i mount the cephfs, i got this: > > [ 316.303851] Pid: 1565, comm: lighttpd Not tainted 3.8.0-22-fscache > #33 HP ProLiant DL160 G6 > [ 316.303853] RIP: 0010:[<ffffffff81045c42>] [<ffffffff81045c42>] > __ticket_spin_lock+0x22/0x30 > [ 316.303861] RSP: 0018:ffff8804180e79f8 EFLAGS: 00000297 > [ 316.303863] RAX: 0000000000000004 RBX: ffffffffa0224e53 RCX: 0000000000000004 > [ 316.303865] RDX: 0000000000000005 RSI: 00000000000000d0 RDI: ffff88041eb29a50 > [ 316.303866] RBP: ffff8804180e79f8 R08: ffffe8ffffa40150 R09: 0000000000000000 > [ 316.303868] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88041da75050 > [ 316.303869] R13: ffff880428ef0000 R14: ffffffff81702b86 R15: ffff8804180e7968 > [ 316.303871] FS: 00007fbcca138700(0000) GS:ffff88042f240000(0000) > knlGS:0000000000000000 > [ 316.303873] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 316.303875] CR2: 00007f5c96649f00 CR3: 00000004180c9000 CR4: 00000000000007e0 > [ 316.303877] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 316.303878] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 316.303880] Process lighttpd (pid: 1565, threadinfo > ffff8804180e6000, task ffff88041cc22e80) > [ 316.303881] Stack: > [ 316.303883] ffff8804180e7a08 ffffffff817047ae ffff8804180e7a58 > ffffffffa02c816a > [ 316.303886] ffff8804180e7a58 ffff88041eb29a50 0000000000000000 > ffff88041eb29d50 > [ 316.303889] ffff88041eb29a50 ffff88041b29ed00 ffff88041eb29a40 > 0000000000000d01 > [ 316.303892] Call Trace: > [ 316.303898] [<ffffffff817047ae>] _raw_spin_lock+0xe/0x20 > [ 316.303910] [<ffffffffa02c816a>] ceph_init_file+0xca/0x1c0 [ceph] > [ 316.303917] [<ffffffffa02c83e1>] ceph_open+0x181/0x3c0 [ceph] > [ 316.303925] [<ffffffffa02c8260>] ? ceph_init_file+0x1c0/0x1c0 [ceph] > [ 316.303930] [<ffffffff8119a62e>] do_dentry_open+0x21e/0x2a0 > [ 316.303933] [<ffffffff8119a6e5>] finish_open+0x35/0x50 > [ 316.303940] [<ffffffffa02c9304>] ceph_atomic_open+0x214/0x2f0 [ceph] > [ 316.303944] [<ffffffff811b416f>] ? __d_alloc+0x5f/0x180 > [ 316.303948] [<ffffffff811a7fa1>] atomic_open+0xf1/0x460 > [ 316.303951] [<ffffffff811a86f4>] lookup_open+0x1a4/0x1d0 > [ 316.303954] [<ffffffff811a8fad>] do_last+0x30d/0x820 > [ 316.303958] [<ffffffff811ab413>] path_openat+0xb3/0x4d0 > [ 316.303962] [<ffffffff815da87d>] ? sock_aio_read+0x2d/0x40 > [ 316.303965] [<ffffffff8119c333>] ? do_sync_read+0xa3/0xe0 > [ 316.303968] [<ffffffff811ac232>] do_filp_open+0x42/0xa0 > [ 316.303971] [<ffffffff811b9eb5>] ? __alloc_fd+0xe5/0x170 > [ 316.303974] [<ffffffff8119be8a>] do_sys_open+0xfa/0x250 > [ 316.303977] [<ffffffff8119cacd>] ? vfs_read+0x10d/0x180 > [ 316.303980] [<ffffffff8119c001>] sys_open+0x21/0x30 > [ 316.303983] [<ffffffff8170d61d>] system_call_fastpath+0x1a/0x1f > > And the console print this lines forever, server is freezed: > [ 376.305754] BUG: soft lockup - CPU#2 stuck for 22s! [lighttpd:1565] > [ 404.294735] BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:1:39] > [ 404.306735] BUG: soft lockup - CPU#2 stuck for 22s! [lighttpd:1565] > [ 432.295716] BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:1:39] > > Have you any idea? > > Elbandi > > * http://packages.ubuntu.com/raring/linux-image-3.8.0-19-generic > > 2013/5/23 Milosz Tanski <milosz@xxxxxxxxx>: >> This is my first at adding fscache support for the Ceph Linux module. >> >> My motivation for doing this work was speedup our distributed database >> that uses the Ceph filesystem as a backing store. By far more of the >> workload that our application is doing is read only and latency is our >> biggest challenge. Being able to cache frequently used blocks on the >> SSD drives that our machines use dramatically speeds up our query >> setup time when we're fetching multiple compressed indexes and then >> navigating the block tree. >> >> The branch containing the two patches is here: >> https://bitbucket.org/adfin/linux-fs.git in the forceph branch. >> >> If you want to review it in your browser here is the bitbucket url: >> https://bitbucket.org/adfin/linux-fs/commits/branch/forceph >> >> I've tested this both in mainline and in the branch that features >> upcoming fscache changes. The patches are broken into two pieces. >> >> 01 - Setups the facility for fscache in it's independent files >> 02 - Enables fscache in the ceph filesystem and adds a new configuration option >> >> The patches will follow in the new few emails as well. >> >> Future wise; there's some new work being done to add write-back >> caching to fscache & NFS. When that's done I'd like to integrated the >> Ceph fscache implementation. From the benchmarks of the author of that >> it seams like it has much the same benefit for write to NFS as bcache >> does. >> >> I'd like to get this into ceph, and I'm looking for feedback. >> >> Thanks, >> - Milosz >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html