Re: Procedure for temporary evacuation and replacement

Joshua Baergen <jbaergen@xxxxxxxxxxxxxxxx> · Thu, 17 Oct 2024 08:15:47 -0600

Ah yes, if you see disk read IOPS going up and up on those draining
OSDs then you might be having issues with older PG deletion logic
interacting poorly with rocksdb tombstones.

Josh

On Thu, Oct 17, 2024 at 8:13 AM Eugen Block <eblock@xxxxxx> wrote:
>
> Hi Frank,
>
> how high is the disk utilization? We see this only from time to time
> on HDD OSDs during regular cluster operation (no recovery). If that
> really happens a lot during recovery, I would consider decreasing
> osd_max_backfills and osd_recovery_max_active_hdd in case they are not
> set to default. Or if you need high recovery rates, you could
> temporarily set 'ceph osd set nodown' to prevent this. Fortunately, I
> didn't have to use that flag for a long time, I used that more often
> in older releases.
>
> Regards,
> Eugen
>
> Zitat von Frank Schilder <frans@xxxxxx>:
>
> > Hi all,
> >
> > I would like to share some preliminary experience. Just setting OSDs
> > "out" manually (ceph osd out ID) does work as intended. the OSDs are
> > drained and their data is placed on other OSDs on the same host.
> > This also survives reboots of OSDs and peering and this turns out to
> > be important.
> >
> > I make the very strange observation that the OSDs that are drained
> > are getting marked down quite often. This actually gets worse over
> > time, the fewer PGs are left the more frequent are these "OSD marked
> > down - OSD still running wrongly marked down by mon - OSD boot"
> > events and I'm a bit at a loss what the cause might be. This is
> > exclusively limited to OSDs that are marked up+out, none of the
> > up+in OSDs shows that behavior. There seems no correlation with
> > anything else present, its all of the OSDs going down->up (one at a
> > time).
> >
> > Some of these restarts might have to do with disk errors, but I
> > doubt all do. There seems to be something else here at play. I don't
> > think this is expected and maybe someone has additional information
> > here.
> >
> > We are almost done with the evacuation. I will report back how the
> > replacement+rebalancing is going.
> >
> > Best regards,
> > =================
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> > ________________________________________
> > From: Frank Schilder <frans@xxxxxx>
> > Sent: Friday, October 11, 2024 12:18 PM
> > To: Robert Sander; ceph-users@xxxxxxx
> > Subject:  Re: Procedure for temporary evacuation and replacement
> >
> > Hi Robert,
> >
> > thanks, that solves it then.
> >
> > Best regards,
> > =================
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> > ________________________________________
> > From: Robert Sander <r.sander@xxxxxxxxxxxxxxxxxxx>
> > Sent: Friday, October 11, 2024 10:20 AM
> > To: ceph-users@xxxxxxx
> > Subject:  Re: Procedure for temporary evacuation and replacement
> >
> > On 10/11/24 10:07, Frank Schilder wrote:
> >> Only problem is that setting an OSD OUT might not be sticky. If the
> >> OSD reboots for some reason it might mark itself IN again.
> >
> > The Ceph cluster distinguishes between manually marked out ("ceph osd
> > out N") and automatically marked out, when an OSD is down for more than
> > 10 minutes.
> >
> > Manually marked out OSDs do not mark themselves in again.
> >
> > Regards
> > --
> > Robert Sander
> > Heinlein Consulting GmbH
> > Schwedter Str. 8/9b, 10119 Berlin
> >
> > https://www.heinlein-support.de
> >
> > Tel: 030 / 405051-43
> > Fax: 030 / 405051-19
> >
> > Amtsgericht Berlin-Charlottenburg - HRB 220009 B
> > Geschäftsführer: Peer Heinlein - Sitz: Berlin
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx