Re: Stability Issue with 52 OSD hosts

Tyler Bishop <tyler.bishop@xxxxxxxxxxxxxxxxx> · Thu, 23 Aug 2018 15:24:11 -0400

Yes I've reviewed all the logs from monitor and host.   I am not
getting useful errors (or any) in dmesg or general messages.

I have 2 ceph clusters, the other cluster is 300 SSD and i never have
issues like this.   That's why Im looking for help.

On Thu, Aug 23, 2018 at 3:22 PM Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, Aug 22, 2018 at 11:39 PM Tyler Bishop
> <tyler.bishop@xxxxxxxxxxxxxxxxx> wrote:
> >
> > During high load testing I'm only seeing user and sys cpu load around 60%... my load doesn't seem to be anything crazy on the host and iowait stays between 6 and 10%.  I have very good `ceph osd perf` numbers too.
> >
> > I am using 10.2.11 Jewel.
> >
> >
> > On Wed, Aug 22, 2018 at 11:30 PM Christian Balzer <chibi@xxxxxxx> wrote:
> >>
> >> Hello,
> >>
> >> On Wed, 22 Aug 2018 23:00:24 -0400 Tyler Bishop wrote:
> >>
> >> > Hi,   I've been fighting to get good stability on my cluster for about
> >> > 3 weeks now.  I am running into intermittent issues with OSD flapping
> >> > marking other OSD down then going back to a stable state for hours and
> >> > days.
> >> >
> >> > The cluster is 4x Cisco UCS S3260 with dual E5-2660, 256GB ram, 40G
> >> > Network to 40G Brocade VDX Switches.  The OSD are 6TB HGST SAS drives
> >> > with 400GB HGST SAS 12G SSDs.   My configuration is 4 journals per
> >> > host with 12 disk per journal for a total of 56 disk per system and 52
> >> > OSD.
> >> >
> >> Any denser and you'd have a storage black hole.
> >>
> >> You already pointed your finger in the (or at least one) right direction
> >> and everybody will agree that this setup is woefully underpowered in the
> >> CPU department.
> >>
> >> > I am using CentOS 7 with kernel 3.10 and the redhat tuned-adm profile
> >> > for throughput-performance enabled.
> >> >
> >> Ceph version would be interesting as well...
> >>
> >> > I have these sysctls set:
> >> >
> >> > kernel.pid_max = 4194303
> >> > fs.file-max = 6553600
> >> > vm.swappiness = 0
> >> > vm.vfs_cache_pressure = 50
> >> > vm.min_free_kbytes = 3145728
> >> >
> >> > I feel like my issue is directly related to the high number of OSD per
> >> > host but I'm not sure what issue I'm really running into.   I believe
> >> > that I have ruled out network issues, i am able to get 38Gbit
> >> > consistently via iperf testing and mtu for jump pings successfully
> >> > with no fragment set and 8972 packet size.
> >> >
> >> The fact that it all works for days at a time suggests this as well, but
> >> you need to verify these things when they're happening.
> >>
> >> > From FIO testing I seem to be able to get 150-200k iops write from my
> >> > rbd clients on 1gbit networking... This is about what I expected due
> >> > to the write penalty and my underpowered CPU for the number of OSD.
> >> >
> >> > I get these messages which I believe are normal?
> >> > 2018-08-22 10:33:12.754722 7f7d009f5700  0 -- 10.20.136.8:6894/718902
> >> > >> 10.20.136.10:6876/490574 pipe(0x55aed77fd400 sd=192 :40502 s=2
> >> > pgs=1084 cs=53 l=0 c=0x55aed805bc80).fault with nothing to send, going
> >> > to standby
> >> >
> >> Ignore.
> >>
> >> > Then randomly I'll get a storm of this every few days for 20 minutes or so:
> >> > 2018-08-22 15:48:32.631186 7f44b7514700 -1 osd.127 37333
> >> > heartbeat_check: no reply from 10.20.142.11:6861 osd.198 since back
> >> > 2018-08-22 15:48:08.052762 front 2018-08-22 15:48:31.282890 (cutoff
> >> > 2018-08-22 15:48:12.630773)
> >> >
> >> Randomly is unlikely.
> >> Again, catch it in the act, atop in huge terminal windows (showing all
> >> CPUs and disks) for all nodes should be very telling, collecting and
> >> graphing this data might work, too.
> >>
> >> My suspects would be deep scrubs and/or high IOPS spikes when this is
> >> happening, starving out OSD processes (CPU wise, RAM should be fine one
> >> supposes).
> >>
> >> Christian
> >>
> >> > Please help!!!
>
> Have you looked at the OSD logs on the OSD nodes by chance?  I found
> that correlating the messages in those logs with your master ceph log
> and also correlating with any messages in syslog or kern.log can
> elucidate the cause of the problem pretty well.
> --
> Alex Gorbachev
> Storcium
>
>
> >> > _______________________________________________
> >> > ceph-users mailing list
> >> > ceph-users@xxxxxxxxxxxxxx
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >>
> >>
> >> --
> >> Christian Balzer        Network/Systems Engineer
> >> chibi@xxxxxxx           Rakuten Communications
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com