Re: Nautilus: PGs stuck remapped+backfilling

Eugen Block <eblock@xxxxxx> · Fri, 11 Oct 2019 11:42:32 +0000

Yeah we also noticed decreasing recovery speed if it comes to the last  
PGs, but we never put up a theory. I think your explanation makes  
sense. Next time I'll try with much higher values, thanks for sharing  
that.

Regards,
Eugen

Zitat von Frank Schilder <frans@xxxxxx>:

I did a lot of data movement lately and my observation is, that  
backfill is very fast (high bandwidth and many thousand keys/s) as  
long as this is many-to-many OSDs. The number of OSD participating  
slowly decreases over time until there is only 1 disk left that is  
written to. This becomes really slow, because the recovery options  
are for keeping all-to-all under control.

In such a case, you might want to temporarily increase these numbers  
to something really high (not 10 or 20, but 1000 or 2000; increase  
in steps) until the single-disk write is over and then set it back  
again. With SSD this should be OK.

Best regards,

=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: 11 October 2019 10:24
To: Frank Schilder
Cc: ceph-users@xxxxxxx
Subject: Re:  Nautilus: PGs stuck remapped+backfilling

You meta data PGs *are* backfilling. It is the "61 keys/s" statement
in the ceph status output in the recovery I/O line. If this is too
slow, increase osd_max_backfills and osd_recovery_max_active.

Or just have some coffee ...

I already had increased osd_max_backfills and osd_recovery_max_active
in order to speed things up, and most of the PGs were remapped pretty
quick (couple of minutes), but these last 3 PGs took almost two hours
to complete, which was unexpected.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx