Re: osd crash - disk hangs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



You’ll need to upgrade your kernel. It’s a terrible div by zero bug that occurs while trying to calculate load. You can still use “top –b –n1” instead of ps, but ultimately the kernel update fixed it for us. You can’t kill procs that are in uninterruptible wait.

 

Here’s the Ubuntu version: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1568729

 

Warren Wang

Walmart 

 

From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of VELARTIS Philipp Dürhammer <p.duerhammer@xxxxxxxxxxx>
Date: Thursday, December 1, 2016 at 7:19 AM
To: "'ceph-users@xxxxxxxxxxxxxx'" <ceph-users@xxxxxxxxxxxxxx>
Subject: [ceph-users] osd crash - disk hangs

 

Hello!

 

Tonight i had a osd crash. See the dump below. Also this osd is still mounted. Whats the cause? A bug? What to do next? I cant do a lsof or ps ax because it hangs.

 

Thank You!

 

Dec  1 00:31:30 ceph2 kernel: [17314369.493029] divide error: 0000 [#1] SMP

Dec  1 00:31:30 ceph2 kernel: [17314369.493062] Modules linked in: act_police cls_basic sch_ingress sch_htb vhost_net vhost macvtap macvlan 8021q garp mrp veth nfsv3 softdog ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_mac ipt_REJECT nf_reject_ipv4 xt_NFLOG nfnetlink_log xt_physdev nf_conntrack_ipv4 nf_defrag_ipv4 xt_comment xt_tcpudp xt_addrtype xt_multiport xt_conntrack xt_set xt_mark ip_set_hash_net ip_set nfnetlink iptable_filter ip_tables x_tables nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bonding xfs libcrc32c ipmi_ssif mxm_wmi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_pcm snd_timer snd soundcore pcspkr input_leds sb_edac shpchp edac_core mei_me ioatdma mei lpc_ich i2c_i801 ipmi_si 8250_fintek wmi ipmi_msghandler mac_hid nf_conntrack_ftp nf_conntrack autofs4 ses enclosure hid_generic usbmouse usbkbd usbhid hid ixgbe(O) vxlan ip6_udp_tunnel megaraid_sas udp_tunnel isci ahci libahci libsas igb(O) scsi_transport_sas dca ptp pps_core fjes

Dec  1 00:31:30 ceph2 kernel: [17314369.493708] CPU: 1 PID: 17291 Comm: ceph-osd Tainted: G           O    4.4.8-1-pve #1

Dec  1 00:31:30 ceph2 kernel: [17314369.493754] Hardware name: Thomas-Krenn.AG X9DR3-F/X9DR3-F, BIOS 3.0a 07/31/2013

Dec  1 00:31:30 ceph2 kernel: [17314369.493799] task: ffff881f6ff05280 ti: ffff880037c4c000 task.ti: ffff880037c4c000

Dec  1 00:31:30 ceph2 kernel: [17314369.493843] RIP: 0010:[<ffffffff810b58fd>]  [<ffffffff810b58fd>] task_numa_find_cpu+0x23d/0x710

Dec  1 00:31:30 ceph2 kernel: [17314369.493893] RSP: 0000:ffff880037c4fbd8  EFLAGS: 00010257

Dec  1 00:31:30 ceph2 kernel: [17314369.493919] RAX: 0000000000000000 RBX: ffff880037c4fc80 RCX: 0000000000000000

Dec  1 00:31:30 ceph2 kernel: [17314369.493962] RDX: 0000000000000000 RSI: ffff88103fa40000 RDI: ffff881033f50c00

Dec  1 00:31:30 ceph2 kernel: [17314369.494006] RBP: ffff880037c4fc48 R08: 0000000202046ea8 R09: 000000000000036b

Dec  1 00:31:30 ceph2 kernel: [17314369.494049] R10: 000000000000007c R11: 0000000000000540 R12: ffff88064fbd0000

Dec  1 00:31:30 ceph2 kernel: [17314369.494093] R13: 0000000000000250 R14: 0000000000000540 R15: 0000000000000009

Dec  1 00:31:30 ceph2 kernel: [17314369.494136] FS:  00007ff17dd6c700(0000) GS:ffff88103fa40000(0000) knlGS:0000000000000000

Dec  1 00:31:30 ceph2 kernel: [17314369.494182] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

Dec  1 00:31:30 ceph2 kernel: [17314369.494209] CR2: 00007ff17dd6aff8 CR3: 0000001025e4b000 CR4: 00000000001426e0

Dec  1 00:31:30 ceph2 kernel: [17314369.494252] Stack:

Dec  1 00:31:30 ceph2 kernel: [17314369.494273]  ffff880037c4fbe8 ffffffff81038219 000000000000003f 0000000000017180

Dec  1 00:31:30 ceph2 kernel: [17314369.494323]  ffff881f6ff05280 0000000000017180 0000000000000251 ffffffffffffffe7

Dec  1 00:31:30 ceph2 kernel: [17314369.494374]  0000000000000251 ffff881f6ff05280 ffff880037c4fc80 00000000000000cb

Dec  1 00:31:30 ceph2 kernel: [17314369.494424] Call Trace:

Dec  1 00:31:30 ceph2 kernel: [17314369.494449]  [<ffffffff81038219>] ? sched_clock+0x9/0x10

Dec  1 00:31:30 ceph2 kernel: [17314369.494476]  [<ffffffff810b62b6>] task_numa_migrate+0x4e6/0xa00

Dec  1 00:31:30 ceph2 kernel: [17314369.494506]  [<ffffffff813fea6c>] ? copy_to_iter+0x7c/0x260

Dec  1 00:31:30 ceph2 kernel: [17314369.494534]  [<ffffffff810b6849>] numa_migrate_preferred+0x79/0x80

Dec  1 00:31:30 ceph2 kernel: [17314369.494563]  [<ffffffff810bb348>] task_numa_fault+0x848/0xd10

Dec  1 00:31:30 ceph2 kernel: [17314369.494591]  [<ffffffff810ba969>] ? should_numa_migrate_memory+0x59/0x130

Dec  1 00:31:30 ceph2 kernel: [17314369.494623]  [<ffffffff811c0314>] handle_mm_fault+0xc64/0x1a20

Dec  1 00:31:30 ceph2 kernel: [17314369.494654]  [<ffffffff8170c3f4>] ? SYSC_recvfrom+0x144/0x160

Dec  1 00:31:30 ceph2 kernel: [17314369.494684]  [<ffffffff8106b4ed>] __do_page_fault+0x19d/0x410

Dec  1 00:31:30 ceph2 kernel: [17314369.494713]  [<ffffffff81003360>] ? exit_to_usermode_loop+0xb0/0xd0

Dec  1 00:31:30 ceph2 kernel: [17314369.494742]  [<ffffffff8106b782>] do_page_fault+0x22/0x30

Dec  1 00:31:30 ceph2 kernel: [17314369.494771]  [<ffffffff8184ab38>] page_fault+0x28/0x30

Dec  1 00:31:30 ceph2 kernel: [17314369.494797] Code: 4d b0 4c 89 ef e8 b4 d0 ff ff 48 8b 4d b0 49 8b 85 b0 00 00 00 31 d2 48 0f af 81 d8 01 00 00 49 8b 4d 78 4c 8b 6b 78 48 83 c1 01 <48> f7 f1 48 8b 4b 20 49 89 c0 48 29 c1 4c 03 43 48 4c 39 75 d0

Dec  1 00:31:30 ceph2 kernel: [17314369.495005] RIP  [<ffffffff810b58fd>] task_numa_find_cpu+0x23d/0x710

Dec  1 00:31:30 ceph2 kernel: [17314369.495035]  RSP <ffff880037c4fbd8>

Dec  1 00:31:30 ceph2 kernel: [17314369.495347] ---[ end trace 7106c9a72840cc7d ]---

 

This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux