Re: Single unfound object in cluster with no previous version - is there anyway to recover rather than deleting the object?

Joachim Kraftmayer <joachim.kraftmayer@xxxxxxxxx> · Mon, 2 Dec 2024 09:25:01 +0100

Hi Ivan,

just to get a better overview.
Can you provide more details about the pool with ID 2?
And also ceph pg 2.c90 query.

Joachim

  joachim.kraftmayer@xxxxxxxxx

  www.clyso.com

  Hohenzollernstr. 27, 80801 Munich

Utting | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE275430677

Am Fr., 29. Nov. 2024 um 12:14 Uhr schrieb Ivan Clayson <
ivan@xxxxxxxxxxxxxxxxx>:

> Hello,
>
> We have an Alma8.9 (version 4 kernel) quincy (17.2.7) CephFS cluster
> with spinners for our bulk data and SSDs for the metadata where we have
> a single unfound object in the bulk pool:
>
>     [root@ceph-n30 ~]# ceph -s
>        cluster:
>          id:     fa7cf62b-e261-49cd-b00e-383c36b79ef3
>          health: HEALTH_ERR
>                  1/849660811 objects unfound (0.000%)
>                  Possible data damage: 1 pg recovery_unfound
>                  Degraded data redundancy: 9/8468903874 objects degraded
>     (0.000%), 1 pg degraded
>
>        services:
>          mon: 3 daemons, quorum ceph-s2,ceph-s3,ceph-s1 (age 44h)
>          mgr: ceph-s2(active, since 45h), standbys: ceph-s3, ceph-s1
>          mds: 1/1 daemons up, 3 standby
>          osd: 439 osds: 439 up (since 43h), 439 in (since 43h); 176
>     remapped pgs
>
>        data:
>          volumes: 1/1 healthy
>          pools:   9 pools, 4321 pgs
>          objects: 849.66M objects, 2.3 PiB
>          usage:   3.0 PiB used, 1.7 PiB / 4.6 PiB avail
>          pgs:     9/8468903874 objects degraded (0.000%)
>                   36630744/8468903874 objects misplaced (0.433%)
>                   1/849660811 objects unfound (0.000%)
>                   4122 active+clean
>                   174  active+remapped+backfill_wait
>                   22   active+clean+scrubbing+deep
>                   2    active+remapped+backfilling
>                   1    active+recovery_unfound+degraded
>
>        io:
>          client:   669 MiB/s rd, 87 MiB/s wr, 302 op/s rd, 77 op/s wr
>          recovery: 175 MiB/s, 59 objects/s
>     [root@ceph-n30 ~]# ceph health detail | grep unfound
>     HEALTH_ERR 1/849661114 objects unfound (0.000%); Possible data
>     damage: 1 pg recovery_unfound; Degraded data redundancy:
>     9/8468906904 objects degraded (0.000%), 1 pg degraded
>     [WRN] OBJECT_UNFOUND: 1/849661114 objects unfound (0.000%)
>          pg 2.c90 has 1 unfound objects
>     [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound
>          pg 2.c90 is active+recovery_unfound+degraded, acting
>     [259,210,390,209,43,66,322,297,25,374], 1 unfound
>          pg 2.c90 is active+recovery_unfound+degraded, acting
>     [259,210,390,209,43,66,322,297,25,374], 1 unfound
>
> We've tried deep-scrubbing and repairing the PG as well as rebooting the
> entire cluster but unfortunately this has not resolved our issue.
>
> The primary OSD (259) log reports that our 1009e1df26d.000000c9 object
> is missing where when we do rados commands on the object that command
> just hangs:
>
>     [root@ceph-n30 ~]# grep 2.c90 /var/log/ceph/ceph-osd.259.log
>     ...
>     2024-11-25T11:38:33.860+0000 7fd409870700  1 osd.259 pg_epoch:
>     512353 pg[2.c90s0( v 512310'8145216 lc 0'0
>     (511405'8142151,512310'8145216] local-lis/les=512348/512349 n=211842
>     ec=1175/1168 lis/c=512348/472766 les/c/f=512349/472770/232522
>     sis=512353 pruub=11.010143280s)
>     [259,210,390,209,43,66,322,297,NONE,374]p259(0) r=0 lpr=512353
>     pi=[472766,512353)/11 crt=512310'8145216 mlcod 0'0 unknown pruub
>     205.739364624s@ m=1 mbc={}] state<Start>: transitioning to Primary
>     2024-11-25T11:38:54.926+0000 7fd409870700  1 osd.259 pg_epoch:
>     512356 pg[2.c90s0( v 512310'8145216 lc 0'0
>     (511405'8142151,512310'8145216] local-lis/les=512353/512354 n=211842
>     ec=1175/1168 lis/c=512353/472766 les/c/f=512354/472770/232522
>     sis=512356 pruub=11.945847511s)
>     [259,210,390,209,43,66,322,297,25,374]p259(0) r=0 lpr=512356
>     pi=[472766,512356)/10 crt=512310'8145216 mlcod 0'0 active pruub
>     227.741577148s@ m=1
>
> mbc={0={(0+0)=1},1={(1+0)=1},2={(1+0)=1},3={(1+0)=1},4={(1+0)=1},5={(1+0)=1},6={(1+0)=1},7={(1+0)=1},8={(0+0)=1},9={(1+0)=1}}]
>     start_peering_interval up
>     [259,210,390,209,43,66,322,297,2147483647,374] ->
>     [259,210,390,209,43,66,322,297,25,374], acting
>     [259,210,390,209,43,66,322,297,2147483647,374] ->
>     [259,210,390,209,43,66,322,297,25,374], acting_primary 259(0) ->
>     259, up_primary 259(0) -> 259, role 0 -> 0, features acting
>     4540138320759226367 upacting 4540138320759226367
>     2024-11-25T11:38:54.926+0000 7fd409870700  1 osd.259 pg_epoch:
>     512356 pg[2.c90s0( v 512310'8145216 lc 0'0
>     (511405'8142151,512310'8145216] local-lis/les=512353/512354 n=211842
>     ec=1175/1168 lis/c=512353/472766 les/c/f=512354/472770/232522
>     sis=512356 pruub=11.945847511s)
>     [259,210,390,209,43,66,322,297,25,374]p259(0) r=0 lpr=512356
>     pi=[472766,512356)/10 crt=512310'8145216 mlcod 0'0 unknown pruub
>     227.741577148s@ m=1 mbc={}] state<Start>: transitioning to Primary
>     2024-11-25T11:38:59.910+0000 7fd409870700  0 osd.259 pg_epoch:
>     512359 pg[2.c90s0( v 512310'8145216 lc 0'0
>     (511405'8142151,512310'8145216] local-lis/les=512356/512357 n=211842
>     ec=1175/1168 lis/c=512356/472766 les/c/f=512357/472770/232522
>     sis=512356) [259,210,390,209,43,66,322,297,25,374]p259(0) r=0
>     lpr=512356 pi=[472766,512356)/10 crt=512310'8145216 mlcod 0'0
>     active+recovering+degraded rops=1 m=1
>
> mbc={0={(0+0)=1},1={(1+0)=1},2={(1+0)=1},3={(1+0)=1},4={(1+0)=1},5={(1+0)=1},6={(1+0)=1},7={(1+0)=1},8={(1+0)=1},9={(1+0)=1}}
>     trimq=[13f6e~134]] get_remaining_shards not enough shards left to
>     try for 2:0930c16c:::1009e1df26d.000000c9:head read result was
>     read_result_t(r=0,
>
> errors={25(8)=-2,43(4)=-2,66(5)=-2,209(3)=-2,210(1)=-2,297(7)=-2,322(6)=-2,390(2)=-2},
>     noattrs, returned=(0, 8388608, []))
>     2024-11-25T11:38:59.911+0000 7fd409870700  0 osd.259 pg_epoch:
>     512359 pg[2.c90s0( v 512310'8145216 lc 0'0
>     (511405'8142151,512310'8145216] local-lis/les=512356/512357 n=211842
>     ec=1175/1168 lis/c=512356/472766 les/c/f=512357/472770/232522
>     sis=512356) [259,210,390,209,43,66,322,297,25,374]p259(0) r=0
>     lpr=512356 pi=[472766,512356)/10 crt=512310'8145216 mlcod 0'0
>     active+recovering+degraded rops=1 m=1 u=1
>
> mbc={0={(0+0)=1},1={(0+0)=1},2={(0+0)=1},3={(0+0)=1},4={(0+0)=1},5={(0+0)=1},6={(0+0)=1},7={(0+0)=1},8={(0+0)=1},9={(1+0)=1}}
>     trimq=[13f6e~134]] on_failed_pull
>     2:0930c16c:::1009e1df26d.000000c9:head from shard
>     25(8),43(4),66(5),209(3),210(1),297(7),322(6),390(2), reps on 374(9)
>     unfound? 1
>     2024-11-25T11:39:01.435+0000 7fd409870700 -1 log_channel(cluster)
>     log [ERR] : 2.c90 has 1 objects unfound and apparently lost
>     2024-11-25T11:39:02.193+0000 7fd409870700 -1 log_channel(cluster)
>     log [ERR] : 2.c90 has 1 objects unfound and apparently lost
>     2024-11-25T11:39:03.590+0000 7fd409870700  1 osd.259 pg_epoch:
>     512362 pg[2.c90s0( v 512310'8145216 lc 0'0
>     (511405'8142151,512310'8145216] local-lis/les=512356/512357 n=211842
>     ec=1175/1168 lis/c=512356/472766 les/c/f=512357/472770/232522
>     sis=512362 pruub=8.352689743s)
>     [259,210,390,209,43,66,322,297,25,NONE]p259(0) r=0 lpr=512362
>     pi=[472766,512362)/11 crt=512310'8145216 mlcod 0'0 active pruub
>     232.812744141s@ m=1 u=1
>
> mbc={0={(0+0)=1},1={(0+0)=1},2={(0+0)=1},3={(0+0)=1},4={(0+0)=1},5={(0+0)=1},6={(0+0)=1},7={(0+0)=1},8={(0+0)=1},9={(1+0)=1}}]
>     start_peering_interval up [259,210,390,209,43,66,322,297,25,374] ->
>     [259,210,390,209,43,66,322,297,25,2147483647], acting
>     [259,210,390,209,43,66,322,297,25,374] ->
>     [259,210,390,209,43,66,322,297,25,2147483647], acting_primary 259(0)
>     -> 259, up_primary 259(0) -> 259, role 0 -> 0, features acting
>     4540138320759226367 upacting 4540138320759226367
>     2024-11-25T11:39:03.591+0000 7fd409870700  1 osd.259 pg_epoch:
>     512362 pg[2.c90s0( v 512310'8145216 lc 0'0
>     (511405'8142151,512310'8145216] local-lis/les=512356/512357 n=211842
>     ec=1175/1168 lis/c=512356/472766 les/c/f=512357/472770/232522
>     sis=512362 pruub=8.352689743s)
>     [259,210,390,209,43,66,322,297,25,NONE]p259(0) r=0 lpr=512362
>     pi=[472766,512362)/11 crt=512310'8145216 mlcod 0'0 unknown pruub
>     232.812744141s@ m=1 mbc={}] state<Start>: transitioning to Primary
>     2024-11-25T11:39:24.954+0000 7fd409870700  1 osd.259 pg_epoch:
>     512365 pg[2.c90s0( v 512310'8145216 lc 0'0
>     (511405'8142151,512310'8145216] local-lis/les=512362/512363 n=211842
>     ec=1175/1168 lis/c=512362/472766 les/c/f=512363/472770/232522
>     sis=512365 pruub=11.731550217s)
>     [259,210,390,209,43,66,322,297,25,374]p259(0) r=0 lpr=512365
>     pi=[472766,512365)/10 crt=512310'8145216 mlcod 0'0 active pruub
>     257.554870605s@ m=1
>
> mbc={0={(0+0)=1},1={(1+0)=1},2={(1+0)=1},3={(1+0)=1},4={(1+0)=1},5={(1+0)=1},6={(1+0)=1},7={(1+0)=1},8={(1+0)=1},9={(0+0)=1}}]
>     start_peering_interval up
>     [259,210,390,209,43,66,322,297,25,2147483647] ->
>     [259,210,390,209,43,66,322,297,25,374], acting
>     [259,210,390,209,43,66,322,297,25,2147483647] ->
>     [259,210,390,209,43,66,322,297,25,374], acting_primary 259(0) ->
>     259, up_primary 259(0) -> 259, role 0 -> 0, features acting
>     4540138320759226367 upacting 4540138320759226367
>     2024-11-25T11:39:24.954+0000 7fd409870700  1 osd.259 pg_epoch:
>     512365 pg[2.c90s0( v 512310'8145216 lc 0'0
>     (511405'8142151,512310'8145216] local-lis/les=512362/512363 n=211842
>     ec=1175/1168 lis/c=512362/472766 les/c/f=512363/472770/232522
>     sis=512365 pruub=11.731550217s)
>     [259,210,390,209,43,66,322,297,25,374]p259(0) r=0 lpr=512365
>     pi=[472766,512365)/10 crt=512310'8145216 mlcod 0'0 unknown pruub
>     257.554870605s@ m=1 mbc={}] state<Start>: transitioning to Primary
>     2024-11-25T11:39:30.679+0000 7fd409870700  0 osd.259 pg_epoch:
>     512368 pg[2.c90s0( v 512310'8145216 lc 0'0
>     (511405'8142151,512310'8145216] local-lis/les=512365/512366 n=211842
>     ec=1175/1168 lis/c=512365/472766 les/c/f=512366/472770/232522
>     sis=512365) [259,210,390,209,43,66,322,297,25,374]p259(0) r=0
>     lpr=512365 pi=[472766,512365)/10 crt=512310'8145216 mlcod 0'0
>     active+recovering+degraded rops=1 m=1
>
> mbc={0={(0+0)=1},1={(1+0)=1},2={(1+0)=1},3={(1+0)=1},4={(1+0)=1},5={(1+0)=1},6={(1+0)=1},7={(1+0)=1},8={(1+0)=1},9={(1+0)=1}}
>     trimq=[13f6e~134]] get_remaining_shards not enough shards left to
>     try for 2:0930c16c:::1009e1df26d.000000c9:head read result was
>     read_result_t(r=0,
>
> errors={25(8)=-2,43(4)=-2,66(5)=-2,209(3)=-2,210(1)=-2,297(7)=-2,322(6)=-2,390(2)=-2},
>     noattrs, returned=(0, 8388608, []))
>     2024-11-25T11:39:30.679+0000 7fd409870700  0 osd.259 pg_epoch:
>     512368 pg[2.c90s0( v 512310'8145216 lc 0'0
>     (511405'8142151,512310'8145216] local-lis/les=512365/512366 n=211842
>     ec=1175/1168 lis/c=512365/472766 les/c/f=512366/472770/232522
>     sis=512365) [259,210,390,209,43,66,322,297,25,374]p259(0) r=0
>     lpr=512365 pi=[472766,512365)/10 crt=512310'8145216 mlcod 0'0
>     active+recovering+degraded rops=1 m=1 u=1
>
> mbc={0={(0+0)=1},1={(0+0)=1},2={(0+0)=1},3={(0+0)=1},4={(0+0)=1},5={(0+0)=1},6={(0+0)=1},7={(0+0)=1},8={(0+0)=1},9={(1+0)=1}}
>     trimq=[13f6e~134]] on_failed_pull
>     2:0930c16c:::1009e1df26d.000000c9:head from shard
>     25(8),43(4),66(5),209(3),210(1),297(7),322(6),390(2), reps on 374(9)
>     unfound? 1
>     2024-11-25T11:40:11.652+0000 7fd409870700 -1 log_channel(cluster)
>     log [ERR] : 2.c90 has 1 objects unfound and apparently lost
>     [root@ceph-n30 ~]# rados -p ec82pool stat 1009e1df26d.000000c9
>
>     ... # hanging indefinitely
>
> Similarly if we do any object store commands, then we get crashes and
> seg faults as well:
>
>     [root@ceph-n25 ~]# ceph pg 2.c90 list_unfound
>     {
>          "num_missing": 1,
>          "num_unfound": 1,
>          "objects": [
>              {
>                  "oid": {
>                      "oid": "1009e1df26d.000000c9",
>                      "key": "",
>                      "snapid": -2,
>                      "hash": 914558096,
>                      "max": 0,
>                      "pool": 2,
>                      "namespace": ""
>                  },
>                  "need": "510502'8140206",
>                  "have": "0'0",
>                  "flags": "none",
>                  "clean_regions": "clean_offsets: [], clean_omap: 0,
>     new_object: 1",
>                  "locations": [
>                      "374(9)"
>                  ]
>              }
>          ],
>          "state": "NotRecovering",
>          "available_might_have_unfound": true,
>          "might_have_unfound": [],
>          "more": false
>     }
>     [root@ceph-n11 ~]# ceph-objectstore-tool --data-path
>     /var/lib/ceph/osd/ceph-374 --debug --pgid 2.c90 1009e1df26d.000000c9
>     dump
>     ...
>          -6> 2024-11-25T15:54:57.387+0000 7f8b6a0a7340  1
>     bluestore(/var/lib/ceph/osd/ceph-374) _upgrade_super from 4, latest 4
>
>          -5> 2024-11-25T15:54:57.387+0000 7f8b6a0a7340  1
>     bluestore(/var/lib/ceph/osd/ceph-374) _upgrade_super done
>
>          -4> 2024-11-25T15:54:57.439+0000 7f8b5d925700  5 prioritycache
>     tune_memory target: 4294967296 mapped: 132694016 unmapped: 618364928
>     heap: 751058944 old mem: 134217728 new mem: 2761652735
>
>          -3> 2024-11-25T15:54:57.439+0000 7f8b5d925700  5 rocksdb:
>     commit_cache_size High Pri Pool Ratio set to 0.0540541
>
>          -2> 2024-11-25T15:54:57.439+0000 7f8b5d925700  5 prioritycache
>     tune_memory target: 4294967296 mapped: 132939776 unmapped: 618119168
>     heap: 751058944 old mem: 2761652735 new mem: 2842823159
>
>          -1> 2024-11-25T15:54:57.439+0000 7f8b5d925700  5
>     bluestore.MempoolThread(0x55cd3a807b40) _resize_shards cache_size:
>     2842823159 kv_alloc: 1241513984 kv_used: 2567024 kv_onode_alloc:
>     42949672 kv_onode_used: -22 meta_alloc: 1174405120 meta_used: 13360
>     data_alloc: 218103808 data_u
>     sed: 0
>
>           0> 2024-11-25T15:54:57.445+0000 7f8b6a0a7340 -1 *** Caught
>     signal (Segmentation fault) **
>       in thread 7f8b6a0a7340 thread_name:ceph-objectstor
>
>       ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2)
>     quincy (stable)
>       1: /lib64/libpthread.so.0(+0x12cf0) [0x7f8b67941cf0]
>       2:
>
> (BlueStore::collection_list(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
>     ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t,
>     std::allocator<ghobject_t> >*, ghobject_t*)+0x4c) [0x55cd37e34f5c]
>       3: (_action_on_all_objects_in_pg(ObjectStore*, coll_t,
>     action_on_object_t&, bool)+0x13b4) [0x55cd377fdf64]
>       4: (action_on_all_objects_in_exact_pg(ObjectStore*, coll_t,
>     action_on_object_t&, bool)+0x64) [0x55cd377fe274]
>       5: main()
>       6: __libc_start_main()
>       7: _start()
>       NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>     needed to interpret this.
>     We don't have a previous verison of this object and trying to
>     fix-lost with the object store command seg faults:
>
>     [root@ceph-n11 ~]  ceph-objectstore-tool --data-path
>     /var/lib/ceph/osd/ceph-374  --pgid 2.c90 --op fix-lost --dry-run
>     *** Caught signal (Segmentation fault) **
>       in thread 7f45d9890340 thread_name:ceph-objectstor
>       ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2)
>     quincy (stable)
>       1: /lib64/libpthread.so.0(+0x12cf0) [0x7f45d712acf0]
>       2:
>
> (BlueStore::collection_list(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
>     ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t,
>     std::allocator<ghobject_t> >*, ghobject_t*)+0x4c) [0x5556f3c36f5c]
>       3: (_action_on_all_objects_in_pg(ObjectStore*, coll_t,
>     action_on_object_t&, bool)+0x13b4) [0x5556f35fff64]
>       4: (action_on_all_objects_in_exact_pg(ObjectStore*, coll_t,
>     action_on_object_t&, bool)+0x64) [0x5556f3600274]
>       5: main()
>       6: __libc_start_main()
>       7: _start()
>     Segmentation fault (core dumped)
>
> The recommended solution seems to be to use "mark_unfound_lost revert"
> but for our object, there is no previous version so I think this command
> will discard and delete the object. The lost object is on our live
> filesystem and there seems to be no easily way to find the backup
> version as I can't access the path and or filename associated with the
> object to recover it from the backup. Is there any way for us to recover
> this object without discarding it? Or should we just accept our losses
> and delete it?
>
> Kindest regards,
>
> Ivan Clayson
>
> --
> Ivan Clayson
> -----------------
> Scientific Computing Officer
> Room 2N269
> Structural Studies
> MRC Laboratory of Molecular Biology
> Francis Crick Ave, Cambridge
> CB2 0QH
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx