Re: Backfilling on Luminous

Jan Marquardt <jm@xxxxxxxxxxx> · Thu, 15 Mar 2018 19:59:02 +0100

Hi David,

Am 15.03.18 um 18:03 schrieb David Turner:
> I upgraded a [1] cluster from Jewel 10.2.7 to Luminous 12.2.2 and last
> week I added 2 nodes to the cluster.  The backfilling has been
> ATROCIOUS.  I have OSDs consistently [2] segfaulting during recovery. 
> There's no pattern of which OSDs are segfaulting, which hosts have
> segfaulting OSDs, etc... It's all over the cluster.  I have been trying
> variants on all of these following settings with different levels of
> success, but I cannot eliminate the blocked requests and segfaulting
> OSDs.  osd_heartbeat_grace, osd_max_backfills, osd_op_thread_suicide_timeout, osd_recovery_max_active, osd_recovery_sleep_hdd, osd_recovery_sleep_hybrid, osd_recovery_thread_timeout,
> and osd_scrub_during_recovery.  Except for setting nobackfilling on the
> cluster I can't stop OSDs from segfaulting during recovery.
> 
> Does anyone have any ideas for this?  I've been struggling with this for
> over a week now.  For the first couple days I rebalanced the cluster and
> had this exact same issue prior to adding new storage.  Even setting
> osd_max_backfills to 1 and recovery_sleep to 1.0, with everything else
> on defaults, doesn't help.
> 
> Backfilling caused things to slow down on Jewel, but I wasn't having
> OSDs segfault multiple times/hour like I am on Luminous.  So many OSDs
> are going down that I had to set nodown to prevent potential data
> instability of OSDs on multiple hosts going up and down all the time. 
> That blocks IO for every OSD that dies either until it comes back up or
> I manually mark it down.  I hope someone has some ideas for me here. 
> Our plan moving forward is to only use half of the capacity of the
> drives by pretending they're 5TB instead of 10TB to increase the spindle
> speed per TB.  Also migrating to bluestore will hopefully help.

Do you see segfaults in dmesg?
This sounds somehow like the problems I experienced during last week.

http://tracker.ceph.com/issues/23258?next_issue_id=23257

For some reason it seems to be gone at the moment, but unfortunately I
don't know why, which is really disappointing.

Best Regards

Jan

> 
> 
> [1] 23 OSD nodes: 15x 10TB Seagate Ironwolf filestore with journals on
> Intel DC P3700, 70% full cluster, Dual Socket E5-2620 v4 @ 2.10GHz,
> 128GB RAM.
> 
> [2]    -19> 2018-03-15 16:42:17.998074 7fe661601700  5 --
> 10.130.115.25:6811/2942118 <http://10.130.115.25:6811/2942118> >>
> 10.130.115.48:0/372681 <http://10.130.115.48:0/372681>
> conn(0x55e3ea087000 :6811 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pg
> s=1920 cs=1 l=1). rx osd.254 seq 74507 0x55e3eb8e2e00 osd_ping(ping
> e93182 stamp 2018-03-15 16:42:17.990698) v4
>    -18> 2018-03-15 16:42:17.998091 7fe661601700  1 --
> 10.130.115.25:6811/2942118 <http://10.130.115.25:6811/2942118> <==
> osd.254 10.130.115.48:0/372681 <http://10.130.115.48:0/372681> 74507
> ==== osd_ping(ping e93182 stamp 2018-03-15 16:42:17.990698)
>  v4 ==== 2004+0+0 (492539280 0 0) 0x55e3eb8e2e00 con 0x55e3ea087000
>    -17> 2018-03-15 16:42:17.998109 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe639772700' had timed out after 60
>    -16> 2018-03-15 16:42:17.998111 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe639f73700' had timed out after 60
>    -15> 2018-03-15 16:42:17.998120 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe63a774700' had timed out after 60
>    -14> 2018-03-15 16:42:17.998123 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe63af75700' had timed out after 60
>    -13> 2018-03-15 16:42:17.998126 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe63b776700' had timed out after 60
>    -12> 2018-03-15 16:42:17.998129 7fe661601700  1 heartbeat_map
> is_healthy 'FileStore::op_tp thread 0x7fe654854700' had timed out after 60
>    -11> 2018-03-15 16:42:18.004203 7fe661601700  5 --
> 10.130.115.25:6811/2942118 <http://10.130.115.25:6811/2942118> >>
> 10.130.115.33:0/3348055 <http://10.130.115.33:0/3348055>
> conn(0x55e3eb5f0000 :6811 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH p
> gs=1894 cs=1 l=1). rx osd.169 seq 74633 0x55e3eb8e2e00 osd_ping(ping
> e93182 stamp 2018-03-15 16:42:17.998828) v4
>    -10> 2018-03-15 16:42:18.004230 7fe661601700  1 --
> 10.130.115.25:6811/2942118 <http://10.130.115.25:6811/2942118> <==
> osd.169 10.130.115.33:0/3348055 <http://10.130.115.33:0/3348055> 74633
> ==== osd_ping(ping e93182 stamp 2018-03-15 16:42:17.998828
> ) v4 ==== 2004+0+0 (2306332339 0 0) 0x55e3eb8e2e00 con 0x55e3eb5f0000
>     -9> 2018-03-15 16:42:18.004241 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe639772700' had timed out after 60
>     -8> 2018-03-15 16:42:18.004244 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe639f73700' had timed out after 60
>     -7> 2018-03-15 16:42:18.004246 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe63a774700' had timed out after 60
>     -6> 2018-03-15 16:42:18.004248 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe63af75700' had timed out after 60
>     -5> 2018-03-15 16:42:18.004249 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe63b776700' had timed out after 60
>     -4> 2018-03-15 16:42:18.004251 7fe661601700  1 heartbeat_map
> is_healthy 'FileStore::op_tp thread 0x7fe654854700' had timed out after 60
>     -3> 2018-03-15 16:42:18.004256 7fe661601700  1 heartbeat_map
> is_healthy 'FileStore::op_tp thread 0x7fe654854700' had suicide timed
> out after 180
>     -2> 2018-03-15 16:42:18.004462 7fe6605ff700  5 --
> 10.130.113.25:6811/2942118 <http://10.130.113.25:6811/2942118> >>
> 10.130.113.33:0/3348055 <http://10.130.113.33:0/3348055>
> conn(0x55e3eb599800 :6811 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH p
> gs=1937 cs=1 l=1). rx osd.169 seq 74633 0x55e3eef6d200 osd_ping(ping
> e93182 stamp 2018-03-15 16:42:17.998828) v4
>     -1> 2018-03-15 16:42:18.004502 7fe6605ff700  1 --
> 10.130.113.25:6811/2942118 <http://10.130.113.25:6811/2942118> <==
> osd.169 10.130.113.33:0/3348055 <http://10.130.113.33:0/3348055> 74633
> ==== osd_ping(ping e93182 stamp 2018-03-15 16:42:17.998828
> ) v4 ==== 2004+0+0 (2306332339 0 0) 0x55e3eef6d200 con 0x55e3eb599800
>      0> 2018-03-15 16:42:18.015185 7fe654854700 -1 *** Caught signal
> (Aborted) **
>  in thread 7fe654854700 thread_name:tp_fstore_op
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Artfiles New Media GmbH | Zirkusweg 1 | 20359 Hamburg
Tel: 040 - 32 02 72 90 | Fax: 040 - 32 02 72 95
E-Mail: support@xxxxxxxxxxx | Web: http://www.artfiles.de
Geschäftsführer: Harald Oltmanns | Tim Evers
Eingetragen im Handelsregister Hamburg - HRB 81478
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com