Re: About lost disk with erasure code

Janne Johansson <icepic.dz@xxxxxxxxx> · Tue, 26 Dec 2023 10:15:26 +0100

Den tis 26 dec. 2023 kl 08:45 skrev Phong Tran Thanh <tranphong079@xxxxxxxxx>:
>
> Hi community,
>
> I am running ceph with block rbd with 6 nodes, erasure code 4+2 with
> min_size of pool is 4.
>
> When three osd is down, and an PG is state down, some pools is can't write
> data, suppose three osd can't start and pg stuck in down state, how i can
> delete or recreate pg to replace down pg or another way to allow pool to
> write/read data?

Depending on how the data is laid out in this pool, you might lose
more or less all data from it.

RBD images get split into pieces of 2 or 4M sizes, so that those
pieces end up on different PGs,
which in turn makes them end up on different OSDs and this allows for
load balancing over the'
whole cluster, but also means that if you lose some PGs on a 40G RBD
image (made up of 10k
pieces), chances are very high that the lost PG did contain one or
more of those 10k pieces.

So lost PGs would probably mean that every RBD image of decent sizes
will have holes in them,
and how this affects all the instances that mount the images will be
hard to tell.
If at all possible, try to use the offline OSD tools to try to get
this PG out of one of the bad OSDs.

https://hawkvelt.id.au/post/2022-4-5-ceph-pg-export-import/ might
help, to see how to run
the export + import commands.

If you can get it out, it can be injected (imported) into any other
running OSD and then replicas
will be recreated and moved to where they should be.

If you have disks to spare, make sure to do full copies of the broken
OSDs and work in the copies
instead, to maximize the chances of restoring your data.

If you are very sure that these three OSDs are never coming back, and
have marked the OSDs
as lost, then I guess

ceph pg force_create_pg <pgid>

would be the next step to have the cluster create empty PGs to replace
the lost ones, but I would
consider this only after trying all the possible options for repairing
at least one of the OSDs that held
the PGs that are missing.

--
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx