Re: Blocked ops, OSD consuming memory, hammer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Not going to attempt threading and apologies for the two messages on
the same topic.  Christian is right, though.  3 nodes per tier, 8 SSDs
per node in the cache tier, 12 spinning disks in the cold tier.  10GE
client network with a separate 10GE back side network.  Each node in
the cold tier has two Intel P3700 SSDs as a journal.  This setup has
yielded excellent performance over the past year.

The memory exhaustion comes purely from one errant OSD process.  All
the remaining processes look fairly normal in terms of memory
consumption.

These nodes aren't particularly busy.  A random sampling shows a few
hundred kilobytes of data being written and very few reads.

Thus far, I've done quite a bit of juggling of OSDs.  Setting the
cluster to noup.  Restarting the failed ones, letting them get to the
current map and then clearing the noup flag and letting them rejoin.
Eventually, they'll fail again and then a fairly intense recovery
happens.

here's ceph -s:

https://dl.dropboxusercontent.com/u/90634073/ceph/ceph_dash_ess.txt

Cluster has been in this state for a while.  There are 3 PGs that seem
to be problematic:

[root@t2-node01 ~]# pg dump | grep recovering
-bash: pg: command not found
[root@t2-node01 ~]# ceph pg dump | grep recovering
dumped all in format plain
9.2f1 1353 1075 4578 1353 1075 9114357760 2611 2611
active+recovering+degraded+remapped 2016-05-24 21:49:26.766924
8577'2611 8642:84 [15,31] 15 [15,31,0] 15 5123'2483 2016-05-23
23:52:54.360710 5123'2483 2016-05-23 23:52:54.360710
12.258 878 875 2628 0 0 4414509568 1534 1534
active+recovering+undersized+degraded 2016-05-24 21:47:48.085476
4261'1534 8587:17712 [4,20] 4 [4,20] 4 4261'1534 2016-05-23
07:22:44.819208 4261'1534 2016-05-23 07:22:44.819208
11.58 376 0 1 2223 0 1593129984 4909 4909
active+recovering+degraded+remapped 2016-05-24 05:49:07.531198
8642'409248 8642:406269 [56,49,41] 56 [40,48,62] 40 4261'406995
2016-05-22 21:40:40.205540 4261'406450 2016-05-21 21:37:35.497307

pg 9.2f1 query:
https://dl.dropboxusercontent.com/u/90634073/ceph/pg_9.21f.txt

When I query 12.258 it just hangs

pg 11.58 query:
https://dl.dropboxusercontent.com/u/90634073/ceph/pg_11.58.txt


Not sure where to go from here.


-H


On Tue, May 24, 2016 at 5:47 PM, Christian Balzer <chibi@xxxxxxx> wrote:
>
> Hello,
>
>
> Hello,
>
> On Tue, 24 May 2016 15:32:02 -0700 Gregory Farnum wrote:
>
>> On Tue, May 24, 2016 at 2:16 PM, Heath Albritton <halbritt@xxxxxxxx>
>> wrote:
>> > Having some problems with my cluster.  Wondering if I could get some
>> > troubleshooting tips:
>> >
>> > Running hammer 0.94.5.  Small cluster with cache tiering.  3 spinning
>> > nodes and 3 SSD nodes.
>> >
>> > Lots of blocked ops.  OSDs are consuming the entirety of the system
>> > memory (128GB) and then falling over.  Lots of blocked ops, slow
>> > requests.  Seeing logs like this:
>> >
>> > 2016-05-24 19:30:09.288941 7f63c126b700  1 heartbeat_map is_healthy
>> > 'FileStore::op_tp thread 0x7f63cb3cd700' had timed out after 60
>> > 2016-05-24 19:30:09.503712 7f63c5273700  0 log_channel(cluster) log
>> > [WRN] : map e7779 wrongly marked me down
>> > 2016-05-24 19:30:11.190178 7f63cabcc700  0 --
>> > 10.164.245.22:6831/5013886 submit_message MOSDPGPushReply(9.10d 7762
>> > [PushReplyOp(3110010d/rbd_data.9647882ae8944a.00000000000026e7/head//9)])
>> > v2 remote, 10.164.245.23:6821/3028423, failed lossy con, dropping
>> > message 0xfc21e00
>> > 2016-05-24 19:30:22.832381 7f63bca62700 -1 osd.23 7780
>> > lsb_release_parse - failed to call lsb_release binary with error: (12)
>> > Cannot allocate memory
>> >
>> > Eventually the OSD fails.  Cluster is in an unhealthy state.
>> >
>> > I can set noup, restart the OSDs and get them on the current map, but
>> > once I put them back into the cluster, they eventually fail.
>>
>> What's the full output of "ceph -s"? It sort of sounds like you're
>> just overloading your spinning OSDs with too many ops. Cache tiering
>> is often less helpful than people think it is, and in some
>> circumstances it can actively hurt throughput; you might be running
>> into that.
>>
>
> Greg, that's an abbreviated re-post of his original message 6 hours
> earlier titled "blocked ops".
>
> And in that he gave more details, as in 12 HDD based OSDs per node in the
> backing pool.
>
> So while you're right about cache-tiers not being an universal cure and
> the need to configure/size/understand them correctly, I think the elephant
> in the room here is that 12 OSDs manage to consume 128GB of RAM.
> And that's just beyond odd.
>
> As for Heath, we do indeed need more data as in:
>
> a) How busy are your HDD nodes? (atop, iostat). Any particular HDDs/OSDs
> standing out, as in being slower/busier for a prolonged time?
>
> b) No SSD journals for the spinners right?
>
> c) The memory exhaustion is purely caused by the OSD processes (atop)? All
> of them equally or are there particular large ones?
>
>
> Christian
> --
> Christian Balzer        Network/Systems Engineer
> chibi@xxxxxxx           Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux