Re: deadlock problem of client + osd

Sage Weil <sage@xxxxxxxxxxxx> · Sat, 19 Feb 2011 12:37:09 -0800 (PST)

On Sun, 20 Feb 2011, Henry Chang wrote:
> Hi all,
> 
> I have a ceph cluster of 3 mons,  2 mdses and 6 osds. The ceph clients
> live along with 3 of the 6 osds.
> I was running iozone on each client to generate the workload. After
> hours, I found that one osd (with the client on it) hung.
> 
> The call stacks of three most suspicious threads of the cosd are:
> 
> [<ffffffff813abeec>] sock_common_recvmsg+0x39/0x4a
> [<ffffffff813a9853>] __sock_recvmsg+0x71/0x7c
> [<ffffffff813a9e11>] sock_recvmsg+0xcf/0xe8
> [<ffffffff813ab189>] sys_recvfrom+0xd7/0x141
> [<ffffffff81012d32>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> [<ffffffffa01634e7>] ceph_osdc_wait_request+0x22/0xc7 [libceph]
> [<ffffffffa0164851>] ceph_osdc_writepages+0xf8/0x169 [libceph]
> [<ffffffffa018f34d>] writepage_nounlock+0x262/0x3df [ceph]
> [<ffffffffa0190292>] ceph_writepage+0x3b/0x58 [ceph]
> [<ffffffff810e24ed>] pageout+0x137/0x203
> [<ffffffff810e29d0>] shrink_page_list+0x265/0x483
> [<ffffffff810e2fb4>] shrink_inactive_list+0x3c6/0x6ee
> [<ffffffff810e3393>] shrink_list+0xb7/0xc3
> [<ffffffff810e3678>] shrink_zone+0x2d9/0x37a
> [<ffffffff810e393f>] do_try_to_free_pages+0x226/0x3a0
> [<ffffffff810e3bb3>] try_to_free_pages+0x6e/0x70
> [<ffffffff810ddb75>] __alloc_pages_nodemask+0x3e4/0x62d
> [<ffffffff81106e0f>] alloc_pages_current+0x95/0x9e
> [<ffffffff813f3890>] tcp_sendmsg+0x3e7/0x876
> [<ffffffff813a97d7>] __sock_sendmsg+0x61/0x6c
> [<ffffffff813a9f3c>] sock_sendmsg+0xcc/0xe5
> [<ffffffff813aa176>] sys_sendmsg+0x221/0x2a5
> [<ffffffff81012d32>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff

This random process was trying to allocate memory, and memory was low, so 
it started (and is waiting on) some writeback...

> [<ffffffff813aca66>] lock_sock_nested+0x94/0xca
> [<ffffffff8140eb7e>] inet_shutdown+0x3e/0xe9
> [<ffffffff813a990f>] sys_shutdown+0x45/0x61
> [<ffffffff81012d32>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0
> 
> I wonder if it is the so-called "deadlock" problem when the client and
> the osd are running on the same node?

I haven't actually ever seen this in practice, but this sure looks like 
it.  You have at least half of the deadlock cycle, a process waiting on 
writeback.  The second part of the loop would be the local cosd process 
blocking (probably trying to allocate memory, maybe even in sock_recvmsg 
above?) while trying to complete that write.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html