cephfs kernel client stability

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We have so far been using ceph-fuse for mounting cephfs, but the small file performance of ceph-fuse is often problematic.  We've been testing the kernel client, and have seen some pretty bad crashes/hangs.

What is the policy on fixes to the kernel client?  Is only the latest stable kernel updated (4.18.x nowadays), or are fixes backported to LTS kernels also (like 4.14.x or 4.9.x for example)? I've seen various threads that certain newer features require pretty new kernels - but I'm wondering whether newer kernels are also required for better stability - or - in general, where the kernel client stability stands nowadays.

Here is an example of kernel hang with 4.14.67.  On heavy loads the machine isn't even pingable.

Sep 29 21:10:16 worker1004 kernel: INFO: rcu_sched self-detected stall on CPU Sep 29 21:10:16 worker1004 kernel: #0111-...: (1 GPs behind) idle=bee/140000000000001/0 softirq=21319/21319 fqs=7499 Sep 29 21:10:16 worker1004 kernel: #011 (t=15000 jiffies g=13989 c=13988 q=8334)
Sep 29 21:10:16 worker1004 kernel: NMI backtrace for cpu 1
Sep 29 21:10:16 worker1004 kernel: CPU: 1 PID: 19436 Comm: kworker/1:42 Tainted: P        W  O    4.14.67 #1 Sep 29 21:10:16 worker1004 kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.6.0 10/27/2017 Sep 29 21:10:16 worker1004 kernel: Workqueue: ceph-msgr ceph_con_workfn [libceph]
Sep 29 21:10:16 worker1004 kernel: Call Trace:
Sep 29 21:10:16 worker1004 kernel: <IRQ>
Sep 29 21:10:16 worker1004 kernel: dump_stack+0x46/0x5f
Sep 29 21:10:16 worker1004 kernel: nmi_cpu_backtrace+0xba/0xc0
Sep 29 21:10:16 worker1004 kernel: ? irq_force_complete_move+0xd0/0xd0
Sep 29 21:10:16 worker1004 kernel: nmi_trigger_cpumask_backtrace+0x8a/0xc0
Sep 29 21:10:16 worker1004 kernel: rcu_dump_cpu_stacks+0x81/0xb1
Sep 29 21:10:16 worker1004 kernel: rcu_check_callbacks+0x642/0x790
Sep 29 21:10:16 worker1004 kernel: ? update_wall_time+0x26d/0x6e0
Sep 29 21:10:16 worker1004 kernel: update_process_times+0x23/0x50
Sep 29 21:10:16 worker1004 kernel: tick_sched_timer+0x2f/0x60
Sep 29 21:10:16 worker1004 kernel: __hrtimer_run_queues+0xa3/0xf0
Sep 29 21:10:16 worker1004 kernel: hrtimer_interrupt+0x94/0x170
Sep 29 21:10:16 worker1004 kernel: smp_apic_timer_interrupt+0x4c/0x90
Sep 29 21:10:16 worker1004 kernel: apic_timer_interrupt+0x84/0x90
Sep 29 21:10:16 worker1004 kernel: </IRQ>
Sep 29 21:10:16 worker1004 kernel: RIP: 0010:crush_hash32_3+0x1e5/0x270 [libceph] Sep 29 21:10:16 worker1004 kernel: RSP: 0018:ffffc9000fdff5d8 EFLAGS: 00000a97 ORIG_RAX: ffffffffffffff10 Sep 29 21:10:16 worker1004 kernel: RAX: 0000000006962033 RBX: ffff883f6e7173c0 RCX: 00000000dcdcc373 Sep 29 21:10:16 worker1004 kernel: RDX: 00000000bd5425ca RSI: 000000008a8b0b56 RDI: 00000000b1983b87 Sep 29 21:10:16 worker1004 kernel: RBP: 0000000000000023 R08: 00000000bd5425ca R09: 00000000137904e9 Sep 29 21:10:16 worker1004 kernel: R10: 0000000000000000 R11: 0000000000000002 R12: 00000000b0f29f21 Sep 29 21:10:16 worker1004 kernel: R13: 000000000000000c R14: 00000000f0ae0000 R15: 0000000000000023
Sep 29 21:10:16 worker1004 kernel: crush_bucket_choose+0x2ad/0x340 [libceph]
Sep 29 21:10:16 worker1004 kernel: crush_choose_firstn+0x1b0/0x4c0 [libceph]
Sep 29 21:10:16 worker1004 kernel: crush_choose_firstn+0x48d/0x4c0 [libceph]
Sep 29 21:10:16 worker1004 kernel: crush_do_rule+0x28c/0x5a0 [libceph]
Sep 29 21:10:16 worker1004 kernel: ceph_pg_to_up_acting_osds+0x459/0x850 [libceph]
Sep 29 21:10:16 worker1004 kernel: calc_target+0x213/0x520 [libceph]
Sep 29 21:10:16 worker1004 kernel: ? ixgbe_xmit_frame_ring+0x362/0xe80 [ixgbe]
Sep 29 21:10:16 worker1004 kernel: ? put_prev_entity+0x27/0x620
Sep 29 21:10:16 worker1004 kernel: ? pick_next_task_fair+0x1c7/0x520
Sep 29 21:10:16 worker1004 kernel: scan_requests.constprop.55+0x16f/0x280 [libceph]
Sep 29 21:10:16 worker1004 kernel: handle_one_map+0x175/0x200 [libceph]
Sep 29 21:10:16 worker1004 kernel: ceph_osdc_handle_map+0x390/0x850 [libceph]
Sep 29 21:10:16 worker1004 kernel: ? ceph_x_encrypt+0x46/0x70 [libceph]
Sep 29 21:10:16 worker1004 kernel: dispatch+0x2ef/0xba0 [libceph]
Sep 29 21:10:16 worker1004 kernel: ? read_partial_message+0x215/0x880 [libceph]
Sep 29 21:10:16 worker1004 kernel: ? inet_recvmsg+0x45/0xb0
Sep 29 21:10:16 worker1004 kernel: try_read+0x6f8/0x11b0 [libceph]
Sep 29 21:10:16 worker1004 kernel: ? sched_clock_cpu+0xc/0xa0
Sep 29 21:10:16 worker1004 kernel: ? put_prev_entity+0x27/0x620
Sep 29 21:10:16 worker1004 kernel: ? pick_next_task_fair+0x415/0x520
Sep 29 21:10:16 worker1004 kernel: ceph_con_workfn+0x9d/0x5a0 [libceph]
Sep 29 21:10:16 worker1004 kernel: process_one_work+0x127/0x290
Sep 29 21:10:16 worker1004 kernel: worker_thread+0x3f/0x3b0
Sep 29 21:10:16 worker1004 kernel: kthread+0xf2/0x130
Sep 29 21:10:16 worker1004 kernel: ? process_one_work+0x290/0x290
Sep 29 21:10:16 worker1004 kernel: ? __kthread_parkme+0x90/0x90
Sep 29 21:10:16 worker1004 kernel: ret_from_fork+0x1f/0x30

Andras

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux