Re: Degraded PG does not discover remapped data on originating OSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This also got reported at https://tracker.ceph.com/issues/37439 —
thanks for the report!

On Wed, Nov 28, 2018 at 6:49 AM Jonas Jelten <jelten@xxxxxxxxx> wrote:
>
> Hi!
>
>
> There seems to be an issue that an OSD is not queried for missing object ec-parts that were remapped, but the OSD for
> this is up. This happened in two different scenarios for us. In both, data is stored in EC pools (8+3).
>
>
> Scenario 0
>
> To remove a broken disk (e.g. osd.22), it is weighted to 0 with ceph osd out 22. Objects are remapped normally. During
> object movement, osd.22 is restarted (or crashes and then starts again). Now the bug shows up: Objects will become
> degraded and stay degraded, because osd.22 is not queried, but it it up and running. ceph pq query shows:
>
>     "might_have_unfound": [
>       {
>         "osd": "22(3)",
>         "status": "not queried"
>       }
>     ],
>
>
> A workaround is to in the broken-disk osd temporarily. The osd is then queried and missing object ec-parts are
> discovered. Then, out the osd again. No objects are degraded any more and disk will be emptied.
>
>
> Scenario 1
>
> Add new disks to the cluster. Data is remapped to be transferred from the old disks (e.g. osd.19) to new disks (e.g. >
> osd.42).
> When there is a restart an OSD of the old disks (or it restarts because of a crash), objects become degraded. The
> missing object ec-part-data is on the osd.19 but again it is not queried. ceph pg query shows:
>
>     "might_have_unfound": [
>       {
>         "osd": "19(6)",
>         "status": "not queried"
>       }
>     ],
>
> Only remapped data seems to be undiscovered: If osd.19 is taken down, much more data is degraded. Mind that osd.19 is
> missing in the acting set in the current state of this PG:
>
>     "up": [38, 36, 28, 17, 13, 39, 48, 10, 29, 5, 47],
>     "acting": [36, 15, 28, 17, 13, 32, 2147483647, 10, 29, 5, 20],
>     "backfill_targets": [
>         "36(1)",
>         "38(0)",
>         "39(5)",
>         "47(10)",
>         "48(6)"
>     ],
>     "acting_recovery_backfill": [
>         "5(9)",
>         "10(7)",
>         "13(4)",
>         "15(1)",
>         "17(3)",
>         "20(10)",
>         "28(2)",
>         "29(8)",
>         "32(5)",
>         "36(0)",
>         "36(1)",
>         "38(0)",
>         "39(5)",
>         "47(10)",
>         "48(6)"
>     ],
>
>
> For this scenario, I have not found a workaround yet. The cluster remains degraded until it has recovered by restoring
> the data.
>
> So, overall I suspect there is a bug which prevents remapped pg data to be discovered. The PG already knows which OSD is
> the correct candidate, but does not query it.
>
>
> I can try fixing this myself, but I'd need some hints from the developers to relevant code parts.
>
> The OSD is stored correctly in pg->might_have_unfound, and I think it should be queried in PG::discover_all_missing, but
> I'm lost there. I'd appreciate any help tracking this down.

Do you have logging indicating that this particular function is where
it goes wrong, or did you find it by inspection?
Since it sounds like this is pretty reproducible, I would try doing
that with "debug osd = 20" set, and read through the primary's log
very carefully while it makes these decisions.
-Greg

>
>
> -- Jonas



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux