Re: Procedure for temporary evacuation and replacement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Interesting and yea does sound like a bug of sorts. I would consider
increasing your osd_heartbeat_grace (at global scope) maybe by 2x (to 40 if
currently at default) to see you through the drain. What version are you
using?

Respectfully,

*Wes Dillingham*
LinkedIn <http://www.linkedin.com/in/wesleydillingham>
wes@xxxxxxxxxxxxxxxxx




On Thu, Oct 17, 2024 at 9:20 AM Frank Schilder <frans@xxxxxx> wrote:

> Hi all,
>
> I would like to share some preliminary experience. Just setting OSDs "out"
> manually (ceph osd out ID) does work as intended. the OSDs are drained and
> their data is placed on other OSDs on the same host. This also survives
> reboots of OSDs and peering and this turns out to be important.
>
> I make the very strange observation that the OSDs that are drained are
> getting marked down quite often. This actually gets worse over time, the
> fewer PGs are left the more frequent are these "OSD marked down - OSD still
> running wrongly marked down by mon - OSD boot" events and I'm a bit at a
> loss what the cause might be. This is exclusively limited to OSDs that are
> marked up+out, none of the up+in OSDs shows that behavior. There seems no
> correlation with anything else present, its all of the OSDs going down->up
> (one at a time).
>
> Some of these restarts might have to do with disk errors, but I doubt all
> do. There seems to be something else here at play. I don't think this is
> expected and maybe someone has additional information here.
>
> We are almost done with the evacuation. I will report back how the
> replacement+rebalancing is going.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Frank Schilder <frans@xxxxxx>
> Sent: Friday, October 11, 2024 12:18 PM
> To: Robert Sander; ceph-users@xxxxxxx
> Subject:  Re: Procedure for temporary evacuation and
> replacement
>
> Hi Robert,
>
> thanks, that solves it then.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Robert Sander <r.sander@xxxxxxxxxxxxxxxxxxx>
> Sent: Friday, October 11, 2024 10:20 AM
> To: ceph-users@xxxxxxx
> Subject:  Re: Procedure for temporary evacuation and
> replacement
>
> On 10/11/24 10:07, Frank Schilder wrote:
> > Only problem is that setting an OSD OUT might not be sticky. If the OSD
> reboots for some reason it might mark itself IN again.
>
> The Ceph cluster distinguishes between manually marked out ("ceph osd
> out N") and automatically marked out, when an OSD is down for more than
> 10 minutes.
>
> Manually marked out OSDs do not mark themselves in again.
>
> Regards
> --
> Robert Sander
> Heinlein Consulting GmbH
> Schwedter Str. 8/9b, 10119 Berlin
>
> https://www.heinlein-support.de
>
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
>
> Amtsgericht Berlin-Charlottenburg - HRB 220009 B
> Geschäftsführer: Peer Heinlein - Sitz: Berlin
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux