On Sun, 20 Feb 2011, Henry Chang wrote: > Hi all, > > I have a ceph cluster of 3 mons, 2 mdses and 6 osds. The ceph clients > live along with 3 of the 6 osds. > I was running iozone on each client to generate the workload. After > hours, I found that one osd (with the client on it) hung. > > The call stacks of three most suspicious threads of the cosd are: > > [<ffffffff813abeec>] sock_common_recvmsg+0x39/0x4a > [<ffffffff813a9853>] __sock_recvmsg+0x71/0x7c > [<ffffffff813a9e11>] sock_recvmsg+0xcf/0xe8 > [<ffffffff813ab189>] sys_recvfrom+0xd7/0x141 > [<ffffffff81012d32>] system_call_fastpath+0x16/0x1b > [<ffffffffffffffff>] 0xffffffffffffffff > > [<ffffffffa01634e7>] ceph_osdc_wait_request+0x22/0xc7 [libceph] > [<ffffffffa0164851>] ceph_osdc_writepages+0xf8/0x169 [libceph] > [<ffffffffa018f34d>] writepage_nounlock+0x262/0x3df [ceph] > [<ffffffffa0190292>] ceph_writepage+0x3b/0x58 [ceph] > [<ffffffff810e24ed>] pageout+0x137/0x203 > [<ffffffff810e29d0>] shrink_page_list+0x265/0x483 > [<ffffffff810e2fb4>] shrink_inactive_list+0x3c6/0x6ee > [<ffffffff810e3393>] shrink_list+0xb7/0xc3 > [<ffffffff810e3678>] shrink_zone+0x2d9/0x37a > [<ffffffff810e393f>] do_try_to_free_pages+0x226/0x3a0 > [<ffffffff810e3bb3>] try_to_free_pages+0x6e/0x70 > [<ffffffff810ddb75>] __alloc_pages_nodemask+0x3e4/0x62d > [<ffffffff81106e0f>] alloc_pages_current+0x95/0x9e > [<ffffffff813f3890>] tcp_sendmsg+0x3e7/0x876 > [<ffffffff813a97d7>] __sock_sendmsg+0x61/0x6c > [<ffffffff813a9f3c>] sock_sendmsg+0xcc/0xe5 > [<ffffffff813aa176>] sys_sendmsg+0x221/0x2a5 > [<ffffffff81012d32>] system_call_fastpath+0x16/0x1b > [<ffffffffffffffff>] 0xffffffffffffffff This random process was trying to allocate memory, and memory was low, so it started (and is waiting on) some writeback... > [<ffffffff813aca66>] lock_sock_nested+0x94/0xca > [<ffffffff8140eb7e>] inet_shutdown+0x3e/0xe9 > [<ffffffff813a990f>] sys_shutdown+0x45/0x61 > [<ffffffff81012d32>] system_call_fastpath+0x16/0x1b > [<ffffffffffffffff>] 0 > > I wonder if it is the so-called "deadlock" problem when the client and > the osd are running on the same node? I haven't actually ever seen this in practice, but this sure looks like it. You have at least half of the deadlock cycle, a process waiting on writeback. The second part of the loop would be the local cosd process blocking (probably trying to allocate memory, maybe even in sock_recvmsg above?) while trying to complete that write. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html