Re: Backfilling on Luminous

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We haven't used jemalloc for anything.  The only thing in our /etc/sysconfig/ceph configuration is increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES.

I didn't see anything in dmesg on one of the recent hosts that had an osd segfault.  I looked at your ticket and that looks like something with PGs being in a bad state from the upgrade.

We have been running stable on Luminous for a few weeks without problems and it is only during backfilling that we see this problem.  Interestingly the problem doesn't appear to happen on recovery of PGs that are recovering as opposed to backfilling.  I tested this by setting nobackfill, upping osd_max_backfills to 20, and waiting for all PGs that had been in recovery_wait to finish recovering.  They all completed (even the ones that were undersized) without any issues on the OSDs or cluster.  As soon as I started backfilling again, I ran into OSDs dying again.

On Thu, Mar 15, 2018 at 2:59 PM Jan Marquardt <jm@xxxxxxxxxxx> wrote:
Hi David,

Am 15.03.18 um 18:03 schrieb David Turner:
> I upgraded a [1] cluster from Jewel 10.2.7 to Luminous 12.2.2 and last
> week I added 2 nodes to the cluster.  The backfilling has been
> ATROCIOUS.  I have OSDs consistently [2] segfaulting during recovery. 
> There's no pattern of which OSDs are segfaulting, which hosts have
> segfaulting OSDs, etc... It's all over the cluster.  I have been trying
> variants on all of these following settings with different levels of
> success, but I cannot eliminate the blocked requests and segfaulting
> OSDs.  osd_heartbeat_grace, osd_max_backfills, osd_op_thread_suicide_timeout, osd_recovery_max_active, osd_recovery_sleep_hdd, osd_recovery_sleep_hybrid, osd_recovery_thread_timeout,
> and osd_scrub_during_recovery.  Except for setting nobackfilling on the
> cluster I can't stop OSDs from segfaulting during recovery.
>
> Does anyone have any ideas for this?  I've been struggling with this for
> over a week now.  For the first couple days I rebalanced the cluster and
> had this exact same issue prior to adding new storage.  Even setting
> osd_max_backfills to 1 and recovery_sleep to 1.0, with everything else
> on defaults, doesn't help.
>
> Backfilling caused things to slow down on Jewel, but I wasn't having
> OSDs segfault multiple times/hour like I am on Luminous.  So many OSDs
> are going down that I had to set nodown to prevent potential data
> instability of OSDs on multiple hosts going up and down all the time. 
> That blocks IO for every OSD that dies either until it comes back up or
> I manually mark it down.  I hope someone has some ideas for me here. 
> Our plan moving forward is to only use half of the capacity of the
> drives by pretending they're 5TB instead of 10TB to increase the spindle
> speed per TB.  Also migrating to bluestore will hopefully help.

Do you see segfaults in dmesg?
This sounds somehow like the problems I experienced during last week.

http://tracker.ceph.com/issues/23258?next_issue_id=23257

For some reason it seems to be gone at the moment, but unfortunately I
don't know why, which is really disappointing.

Best Regards

Jan

>
>
> [1] 23 OSD nodes: 15x 10TB Seagate Ironwolf filestore with journals on
> Intel DC P3700, 70% full cluster, Dual Socket E5-2620 v4 @ 2.10GHz,
> 128GB RAM.
>
> [2]    -19> 2018-03-15 16:42:17.998074 7fe661601700  5 --
> 10.130.115.25:6811/2942118 <http://10.130.115.25:6811/2942118> >>
> 10.130.115.48:0/372681 <http://10.130.115.48:0/372681>
> conn(0x55e3ea087000 :6811 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pg
> s=1920 cs=1 l=1). rx osd.254 seq 74507 0x55e3eb8e2e00 osd_ping(ping
> e93182 stamp 2018-03-15 16:42:17.990698) v4
>    -18> 2018-03-15 16:42:17.998091 7fe661601700  1 --
> 10.130.115.25:6811/2942118 <http://10.130.115.25:6811/2942118> <==
> osd.254 10.130.115.48:0/372681 <http://10.130.115.48:0/372681> 74507
> ==== osd_ping(ping e93182 stamp 2018-03-15 16:42:17.990698)
>  v4 ==== 2004+0+0 (492539280 0 0) 0x55e3eb8e2e00 con 0x55e3ea087000
>    -17> 2018-03-15 16:42:17.998109 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe639772700' had timed out after 60
>    -16> 2018-03-15 16:42:17.998111 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe639f73700' had timed out after 60
>    -15> 2018-03-15 16:42:17.998120 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe63a774700' had timed out after 60
>    -14> 2018-03-15 16:42:17.998123 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe63af75700' had timed out after 60
>    -13> 2018-03-15 16:42:17.998126 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe63b776700' had timed out after 60
>    -12> 2018-03-15 16:42:17.998129 7fe661601700  1 heartbeat_map
> is_healthy 'FileStore::op_tp thread 0x7fe654854700' had timed out after 60
>    -11> 2018-03-15 16:42:18.004203 7fe661601700  5 --
> 10.130.115.25:6811/2942118 <http://10.130.115.25:6811/2942118> >>
> 10.130.115.33:0/3348055 <http://10.130.115.33:0/3348055>
> conn(0x55e3eb5f0000 :6811 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH p
> gs=1894 cs=1 l=1). rx osd.169 seq 74633 0x55e3eb8e2e00 osd_ping(ping
> e93182 stamp 2018-03-15 16:42:17.998828) v4
>    -10> 2018-03-15 16:42:18.004230 7fe661601700  1 --
> 10.130.115.25:6811/2942118 <http://10.130.115.25:6811/2942118> <==
> osd.169 10.130.115.33:0/3348055 <http://10.130.115.33:0/3348055> 74633
> ==== osd_ping(ping e93182 stamp 2018-03-15 16:42:17.998828
> ) v4 ==== 2004+0+0 (2306332339 0 0) 0x55e3eb8e2e00 con 0x55e3eb5f0000
>     -9> 2018-03-15 16:42:18.004241 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe639772700' had timed out after 60
>     -8> 2018-03-15 16:42:18.004244 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe639f73700' had timed out after 60
>     -7> 2018-03-15 16:42:18.004246 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe63a774700' had timed out after 60
>     -6> 2018-03-15 16:42:18.004248 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe63af75700' had timed out after 60
>     -5> 2018-03-15 16:42:18.004249 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe63b776700' had timed out after 60
>     -4> 2018-03-15 16:42:18.004251 7fe661601700  1 heartbeat_map
> is_healthy 'FileStore::op_tp thread 0x7fe654854700' had timed out after 60
>     -3> 2018-03-15 16:42:18.004256 7fe661601700  1 heartbeat_map
> is_healthy 'FileStore::op_tp thread 0x7fe654854700' had suicide timed
> out after 180
>     -2> 2018-03-15 16:42:18.004462 7fe6605ff700  5 --
> 10.130.113.25:6811/2942118 <http://10.130.113.25:6811/2942118> >>
> 10.130.113.33:0/3348055 <http://10.130.113.33:0/3348055>
> conn(0x55e3eb599800 :6811 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH p
> gs=1937 cs=1 l=1). rx osd.169 seq 74633 0x55e3eef6d200 osd_ping(ping
> e93182 stamp 2018-03-15 16:42:17.998828) v4
>     -1> 2018-03-15 16:42:18.004502 7fe6605ff700  1 --
> 10.130.113.25:6811/2942118 <http://10.130.113.25:6811/2942118> <==
> osd.169 10.130.113.33:0/3348055 <http://10.130.113.33:0/3348055> 74633
> ==== osd_ping(ping e93182 stamp 2018-03-15 16:42:17.998828
> ) v4 ==== 2004+0+0 (2306332339 0 0) 0x55e3eef6d200 con 0x55e3eb599800
>      0> 2018-03-15 16:42:18.015185 7fe654854700 -1 *** Caught signal
> (Aborted) **
>  in thread 7fe654854700 thread_name:tp_fstore_op
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

--
Artfiles New Media GmbH | Zirkusweg 1 | 20359 Hamburg
Tel: 040 - 32 02 72 90 | Fax: 040 - 32 02 72 95
E-Mail: support@xxxxxxxxxxx | Web: http://www.artfiles.de
Geschäftsführer: Harald Oltmanns | Tim Evers
Eingetragen im Handelsregister Hamburg - HRB 81478
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux