Re: Stability Issue with 52 OSD hosts

Christian Balzer <chibi@xxxxxxx> · Thu, 23 Aug 2018 12:30:33 +0900

Hello,

On Wed, 22 Aug 2018 23:00:24 -0400 Tyler Bishop wrote:

> Hi,   I've been fighting to get good stability on my cluster for about
> 3 weeks now.  I am running into intermittent issues with OSD flapping
> marking other OSD down then going back to a stable state for hours and
> days.
> 
> The cluster is 4x Cisco UCS S3260 with dual E5-2660, 256GB ram, 40G
> Network to 40G Brocade VDX Switches.  The OSD are 6TB HGST SAS drives
> with 400GB HGST SAS 12G SSDs.   My configuration is 4 journals per
> host with 12 disk per journal for a total of 56 disk per system and 52
> OSD.
>
Any denser and you'd have a storage black hole.

You already pointed your finger in the (or at least one) right direction
and everybody will agree that this setup is woefully underpowered in the
CPU department.

> I am using CentOS 7 with kernel 3.10 and the redhat tuned-adm profile
> for throughput-performance enabled.
> 
Ceph version would be interesting as well...

> I have these sysctls set:
> 
> kernel.pid_max = 4194303
> fs.file-max = 6553600
> vm.swappiness = 0
> vm.vfs_cache_pressure = 50
> vm.min_free_kbytes = 3145728
> 
> I feel like my issue is directly related to the high number of OSD per
> host but I'm not sure what issue I'm really running into.   I believe
> that I have ruled out network issues, i am able to get 38Gbit
> consistently via iperf testing and mtu for jump pings successfully
> with no fragment set and 8972 packet size.
> 
The fact that it all works for days at a time suggests this as well, but
you need to verify these things when they're happening.

> From FIO testing I seem to be able to get 150-200k iops write from my
> rbd clients on 1gbit networking... This is about what I expected due
> to the write penalty and my underpowered CPU for the number of OSD.
> 
> I get these messages which I believe are normal?
> 2018-08-22 10:33:12.754722 7f7d009f5700  0 -- 10.20.136.8:6894/718902
> >> 10.20.136.10:6876/490574 pipe(0x55aed77fd400 sd=192 :40502 s=2  
> pgs=1084 cs=53 l=0 c=0x55aed805bc80).fault with nothing to send, going
> to standby
> 
Ignore.

> Then randomly I'll get a storm of this every few days for 20 minutes or so:
> 2018-08-22 15:48:32.631186 7f44b7514700 -1 osd.127 37333
> heartbeat_check: no reply from 10.20.142.11:6861 osd.198 since back
> 2018-08-22 15:48:08.052762 front 2018-08-22 15:48:31.282890 (cutoff
> 2018-08-22 15:48:12.630773)
> 
Randomly is unlikely.
Again, catch it in the act, atop in huge terminal windows (showing all
CPUs and disks) for all nodes should be very telling, collecting and
graphing this data might work, too.

My suspects would be deep scrubs and/or high IOPS spikes when this is
happening, starving out OSD processes (CPU wise, RAM should be fine one
supposes).

Christian

> Please help!!!
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com