Re: One pg stuck in active+undersized+degraded after OSD down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



If I ignore the dire warnings and about losing data and do:
ceph osd purge 7

will I lose data? There are still 2 copies of everything right?

I need to remove the node with the OSD from the k8s cluster, reinstall it
and have it re-join the cluster. This will bring in some new OSDs and maybe
Ceph will use them to sort out the stuck PG?

Is there a way to trigger Ceph to try find another OSD for the stuck pg?

On Thu, Nov 18, 2021 at 2:20 PM David Tinker <david.tinker@xxxxxxxxx> wrote:

> I just grepped all the OSD pod logs for error and warn and nothing comes
> up:
>
> # k logs -n rook-ceph rook-ceph-osd-10-659549cd48-nfqgk  | grep -i warn
> etc
>
> I am assuming that would bring back something if any of them were unhappy.
>
>
>
> On Thu, Nov 18, 2021 at 1:26 PM Stefan Kooman <stefan@xxxxxx> wrote:
>
>> On 11/18/21 11:45, David Tinker wrote:
>>
>> <snip>
>>
>> >      "recovery_state": [
>> >          {
>> >              "name": "Started/Primary/Active",
>> >              "enter_time": "2021-11-18T07:13:25.618950+0000",
>> >              "might_have_unfound": [],
>> >              "recovery_progress": {
>> >                  "backfill_targets": [],
>> >                  "waiting_on_backfill": [],
>> >                  "last_backfill_started": "MIN",
>> >                  "backfill_info": {
>> >                      "begin": "MIN",
>> >                      "end": "MIN",
>> >                      "objects": []
>> >                  },
>> >                  "peer_backfill_info": [],
>> >                  "backfills_in_flight": [],
>> >                  "recovering": [],
>> >                  "pg_backend": {
>> >                      "pull_from_peer": [],
>> >                      "pushing": []
>> >                  }
>> >              }
>> >          },
>> >          {
>> >              "name": "Started",
>> >              "enter_time": "2021-11-18T07:13:25.618794+0000"
>> >          },
>> >          {
>> >              "scrubber.epoch_start": "2149",
>> >              "scrubber.active": false,
>> >              "scrubber.state": "INACTIVE",
>> >              "scrubber.start": "MIN",
>> >              "scrubber.end": "MIN",
>> >              "scrubber.max_end": "MIN",
>> >              "scrubber.subset_last_update": "0'0",
>> >              "scrubber.deep": false,
>> >              "scrubber.waiting_on_whom": []
>> >          }
>> >      ],
>> >      "agent_state": {}
>>
>> Nothing unusual in the recovery_state. I expected a reason why Ceph
>> could not make progress.
>>
>> Is there logged something in osd.0 that might give a hint what is going
>> on here?
>>
>> Gr. Stefan
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux