Re: Flapping OSDs, Large meta directories in OSDs

Tom Christensen <pavera@xxxxxxxxx> · Mon, 30 Nov 2015 14:09:09 -0700

No, CPU and memory look normal.  We haven't been fast/lucky enough with iostat to see if we're just slamming the disk itself, I continue to attempt to catch one, get logged into the node, find the disk and get iostat running before the OSD comes back up.  We haven't flapped that many OSDs, and most of them have occurred in the middle of the night during our peak load times.

On Mon, Nov 30, 2015 at 1:41 PM, Wido den Hollander <wido@xxxxxxxx> wrote:
On 11/30/2015 08:56 PM, Tom Christensen wrote:

> We recently upgraded to 0.94.3 from firefly and now for the last week

> have had intermittent slow requests and flapping OSDs.  We have been

> unable to nail down the cause, but its feeling like it may be related to

> our osdmaps not getting deleted properly.  Most of our osds are now

> storing over 100GB of data in the meta directory, almost all of that is

> historical osd maps going back over 7 days old.

>

That is odd. Do you have anything special in the ceph.conf regarding the

OSDs and how much maps they store? I guess not, but I just wanted to check.

There are some settings you might want to play with with such large OSD

clusters. Looking at src/common/config_opts.h:

OPTION(osd_map_dedup, OPT_BOOL, true)

OPTION(osd_map_max_advance, OPT_INT, 150) // make this < cache_size!

OPTION(osd_map_cache_size, OPT_INT, 200)

OPTION(osd_map_message_max, OPT_INT, 100)  // max maps per MOSDMap message

OPTION(osd_map_share_max_epochs, OPT_INT, 100)  // cap on # of inc maps

we send to peers, clients

You also might want to take a look at this PDF from Cern:

https://cds.cern.ch/record/2015206/files/CephScaleTestMarch2015.pdf

> We did do a small cluster change (We added 35 OSDs to a 1445 OSD

> cluster), the rebalance took about 36 hours, and it completed 10 days

> ago.  Since that time the cluster has been HEALTH_OK and all pgs have

> been active+clean except for when we have an OSD flap.

>

> When the OSDs flap they do not crash and restart, they just go

> unresponsive for 1-3 minutes, and then come back alive all on their

> own.  They get marked down by peers, and cause some peering and then

> they just come back rejoin the cluster and continue on their merry way.

>

Do you see any high CPU or memory usage at that point?

> We see a bunch of this in the logs while the OSD is catatonic:

>

> Nov 30 11:23:38 osd-10 ceph-osd: 2015-11-30 11:22:32.143166

> 7f5b03679700  1 heartbeat_map is_healthy 'OSD::osd_tp thread

> 0x7f5affe72700' had timed out after 15

>

> Nov 30 11:23:38 osd-10 ceph-osd: 2015-11-30 11:22:32.143176 7f5b03679700

> 10 osd.1191 1203850 internal heartbeat not healthy, dropping ping request

>

> Nov 30 11:23:38 osd-10 ceph-osd: 2015-11-30 11:22:32.143210

> 7f5b04e7c700  1 heartbeat_map is_healthy 'OSD::osd_tp thread

> 0x7f5affe72700' had timed out after 15

>

> Nov 30 11:23:38 osd-10 ceph-osd: 2015-11-30 11:22:32.143218 7f5b04e7c700

> 10 osd.1191 1203850 internal heartbeat not healthy, dropping ping request

>

> Nov 30 11:23:38 osd-10 ceph-osd: 2015-11-30 11:22:32.143288

> 7f5b03679700  1 heartbeat_map is_healthy 'OSD::osd_tp thread

> 0x7f5affe72700' had timed out after 15

>

> Nov 30 11:23:38 osd-10 ceph-osd: 2015-11-30 11:22:32.143293 7f5b03679700

> 10 osd.1191 1203850 internal heartbeat not healthy, dropping ping request

>

>

> I have a chunk of logs at debug 20/5, not sure if I should have done

> just 20... It's pretty hard to catch, we have to basically see the slow

> requests and get debug logging set in about a 5-10 second window before

> the OSD stops responding to the admin socket...

>

> As networking is almost always the cause of flapping OSDs we have tested

> the network quite extensively.  It hasn't changed physically since

> before the hammer upgrade, and was performing well.  We have done large

> amounts of ping tests and have not seen a single dropped packet between

> osd nodes or between osd nodes and mons.

>

> I don't see any error packets or drops on switches either.

>

> Ideas?

>

>

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

--

Wido den Hollander

42on B.V.

Ceph trainer and consultant

Phone: +31 (0)20 700 9902

Skype: contact42on

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com