Crash in ceph_readdir.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello, 

I've been investigating the following crash with cephfs: 

[8734559.785146] general protection fault: 0000 [#1] SMP 
[8734559.791921]  ioatdma shpchp ipmi_devintf ipmi_si ipmi_msghandler tcp_scalable ib_qib dca ib_mad ib_core ib_addr ipv6 [last unloaded: stat_faker_4410clouder4]
[8734559.793307] CPU: 31 PID: 1917 Comm: rsync Tainted: G        W  O    4.4.10-clouder4 #1
[8734559.793686] Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1a 10/16/2015
[8734559.793920] task: ffff883f3defc4c0 ti: ffff8821ef2e0000 task.ti: ffff8821ef2e0000
[8734559.794306] RIP: 0010:[<ffffffff813134d0>]  [<ffffffff813134d0>] lockref_get_not_dead+0x10/0xa0
[8734559.794754] RSP: 0018:ffff8821ef2e3c28  EFLAGS: 00010296
[8734559.794981] RAX: ffff881621afe000 RBX: 7261666153203689 RCX: 0000000000000007
[8734559.795364] RDX: 0000000000000189 RSI: ffff8821ef2e3c38 RDI: 7261666153203689
[8734559.795742] RBP: ffff8821ef2e3c68 R08: 0000000000000002 R09: 000000000000077d
[8734559.796130] R10: ffffea005886bf80 R11: ffff882cb6fe4e00 R12: 0000000000005c48
[8734559.796511] R13: ffff8821ef2e3ef8 R14: 0000000000000000 R15: ffff88015aabcdd8
[8734559.796892] FS:  00007fbed7c5e700(0000) GS:ffff881fffc60000(0000) knlGS:0000000000000000
[8734559.797276] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[8734559.797507] CR2: ffffffffff600400 CR3: 000000085bd95000 CR4: 00000000001406e0
[8734559.797886] Stack:
[8734559.798110]  ffff8821ef2e3c58 0000000000000000 ffff8821ef2e3c68 ffffffff8107b082
[8734559.798747]  0000000000000000 ffffea005886bf80 0000000000005c48 7261666153203631
[8734559.799388]  ffff8821ef2e3d78 ffffffffa04ed6ae ffff8821ef2e3e58 ffff8821ef2e3d08
[8734559.800032] Call Trace:
[8734559.800260]  [<ffffffff8107b082>] ? __might_sleep+0x52/0x90
[8734559.800496]  [<ffffffffa04ed6ae>] __dcache_readdir+0x21e/0x480 [ceph]
[8734559.800727]  [<ffffffff811ad482>] ? path_put+0x22/0x30
[8734559.800957]  [<ffffffffa04f67b8>] ? __touch_cap+0x28/0x90 [ceph]
[8734559.801195]  [<ffffffffa04f6965>] ? ceph_cap_string+0xe5/0x100 [ceph]
[8734559.801432]  [<ffffffffa04f6bb1>] ? __ceph_caps_issued_mask+0x141/0x150 [ceph]
[8734559.801819]  [<ffffffffa04ee23a>] ceph_readdir+0x6ea/0x7d0 [ceph]
[8734559.802060]  [<ffffffff8115e56a>] ? __might_fault+0x3a/0x50
[8734559.802290]  [<ffffffff811a87fa>] ? cp_new_stat+0x15a/0x180
[8734559.802521]  [<ffffffff8107b082>] ? __might_sleep+0x52/0x90
[8734559.802751]  [<ffffffff811b5b7e>] iterate_dir+0xae/0x130
[8734559.802981]  [<ffffffff811b5d90>] SyS_getdents+0x90/0x110
[8734559.803216]  [<ffffffff811b5ea0>] ? SyS_old_readdir+0x90/0x90
[8734559.803445]  [<ffffffff81639617>] entry_SYSCALL_64_fastpath+0x12/0x6a
[8734559.803673] Code: e8 56 5e 32 00 ff 43 04 c6 03 00 65 ff 0d 5d 77 cf 7e eb d2 0f 1f 80 00 00 00 00 55 48 89 e5 53 48 8d 75 d0 48 83 ec 38 48 89 fb <48> 8b 17 48 8d 7d e0 89 55 e0 48 89 55 c0 8b 45 e0 89 45 d0 85 
[8734559.808422] RIP  [<ffffffff813134d0>] lockref_get_not_dead+0x10/0xa0
[8734559.808721]  RSP <ffff8821ef2e3c28>

So the faulting instruction is (%rdi),%rdx, looking at the register
dump RDI clearly has a bogus value. I started backtracking from there 
to acquire more context e.g. ge the state of the dir's ceph_inode_info 
as well as the ceph_readdir_cache_control and here is what I found:

1. The dentry representing the dir which is being passed to __dcache_readdir: 
http://sprunge.us/bAQH - the filename is rather strange, searching among the files
in the ceph mount point I couldn't find this file. Also, here is the state of the 
ceph_inode_info: http://sprunge.us/AYRI 

crash> struct ceph_readdir_cache_control ffff8821ef2e3ce8
struct ceph_readdir_cache_control {
  page = 0xffffea005886bf80, 
  dentries = 0xffff881621afe000, 
  index = 2953
}


According to the state of the ceph_inoide_info this means that 
ceph_dir_is_complete_ordered would return true and the second condition
should also be true since ptr_pos is held in r12 and the dir size is 26496. 
So the dentry being passed should be the 2953 % 512 = 393 in the cache_ctl.dentries array. 
Unfortunately my crashdump excldues the page cache pages and I cannot really see
what are the contents of the dentries array. 

Could you provide any info on how to further debug this 

Regards, 
Nikolay 
 


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux