Single unfound object in cluster with no previous version - is there anyway to recover rather than deleting the object?

Ivan Clayson <ivan@xxxxxxxxxxxxxxxxx> · Fri, 29 Nov 2024 11:14:09 +0000

Hello,

We have an Alma8.9 (version 4 kernel) quincy (17.2.7) CephFS cluster 
with spinners for our bulk data and SSDs for the metadata where we have 
a single unfound object in the bulk pool:

   [root@ceph-n30 ~]# ceph -s
      cluster:
        id:     fa7cf62b-e261-49cd-b00e-383c36b79ef3
        health: HEALTH_ERR
                1/849660811 objects unfound (0.000%)
                Possible data damage: 1 pg recovery_unfound
                Degraded data redundancy: 9/8468903874 objects degraded
   (0.000%), 1 pg degraded

      services:
        mon: 3 daemons, quorum ceph-s2,ceph-s3,ceph-s1 (age 44h)
        mgr: ceph-s2(active, since 45h), standbys: ceph-s3, ceph-s1
        mds: 1/1 daemons up, 3 standby
        osd: 439 osds: 439 up (since 43h), 439 in (since 43h); 176
   remapped pgs

      data:
        volumes: 1/1 healthy
        pools:   9 pools, 4321 pgs
        objects: 849.66M objects, 2.3 PiB
        usage:   3.0 PiB used, 1.7 PiB / 4.6 PiB avail
        pgs:     9/8468903874 objects degraded (0.000%)
                 36630744/8468903874 objects misplaced (0.433%)
                 1/849660811 objects unfound (0.000%)
                 4122 active+clean
                 174  active+remapped+backfill_wait
                 22   active+clean+scrubbing+deep
                 2    active+remapped+backfilling
                 1    active+recovery_unfound+degraded

      io:
        client:   669 MiB/s rd, 87 MiB/s wr, 302 op/s rd, 77 op/s wr
        recovery: 175 MiB/s, 59 objects/s
   [root@ceph-n30 ~]# ceph health detail | grep unfound
   HEALTH_ERR 1/849661114 objects unfound (0.000%); Possible data
   damage: 1 pg recovery_unfound; Degraded data redundancy:
   9/8468906904 objects degraded (0.000%), 1 pg degraded
   [WRN] OBJECT_UNFOUND: 1/849661114 objects unfound (0.000%)
        pg 2.c90 has 1 unfound objects
   [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound
        pg 2.c90 is active+recovery_unfound+degraded, acting
   [259,210,390,209,43,66,322,297,25,374], 1 unfound
        pg 2.c90 is active+recovery_unfound+degraded, acting
   [259,210,390,209,43,66,322,297,25,374], 1 unfound

We've tried deep-scrubbing and repairing the PG as well as rebooting the 
entire cluster but unfortunately this has not resolved our issue.

The primary OSD (259) log reports that our 1009e1df26d.000000c9 object 
is missing where when we do rados commands on the object that command 
just hangs:

   [root@ceph-n30 ~]# grep 2.c90 /var/log/ceph/ceph-osd.259.log
   ...
   2024-11-25T11:38:33.860+0000 7fd409870700  1 osd.259 pg_epoch:
   512353 pg[2.c90s0( v 512310'8145216 lc 0'0
   (511405'8142151,512310'8145216] local-lis/les=512348/512349 n=211842
   ec=1175/1168 lis/c=512348/472766 les/c/f=512349/472770/232522
   sis=512353 pruub=11.010143280s)
   [259,210,390,209,43,66,322,297,NONE,374]p259(0) r=0 lpr=512353
   pi=[472766,512353)/11 crt=512310'8145216 mlcod 0'0 unknown pruub
   205.739364624s@ m=1 mbc={}] state<Start>: transitioning to Primary
   2024-11-25T11:38:54.926+0000 7fd409870700  1 osd.259 pg_epoch:
   512356 pg[2.c90s0( v 512310'8145216 lc 0'0
   (511405'8142151,512310'8145216] local-lis/les=512353/512354 n=211842
   ec=1175/1168 lis/c=512353/472766 les/c/f=512354/472770/232522
   sis=512356 pruub=11.945847511s)
   [259,210,390,209,43,66,322,297,25,374]p259(0) r=0 lpr=512356
   pi=[472766,512356)/10 crt=512310'8145216 mlcod 0'0 active pruub
   227.741577148s@ m=1
   mbc={0={(0+0)=1},1={(1+0)=1},2={(1+0)=1},3={(1+0)=1},4={(1+0)=1},5={(1+0)=1},6={(1+0)=1},7={(1+0)=1},8={(0+0)=1},9={(1+0)=1}}]
   start_peering_interval up
   [259,210,390,209,43,66,322,297,2147483647,374] ->
   [259,210,390,209,43,66,322,297,25,374], acting
   [259,210,390,209,43,66,322,297,2147483647,374] ->
   [259,210,390,209,43,66,322,297,25,374], acting_primary 259(0) ->
   259, up_primary 259(0) -> 259, role 0 -> 0, features acting
   4540138320759226367 upacting 4540138320759226367
   2024-11-25T11:38:54.926+0000 7fd409870700  1 osd.259 pg_epoch:
   512356 pg[2.c90s0( v 512310'8145216 lc 0'0
   (511405'8142151,512310'8145216] local-lis/les=512353/512354 n=211842
   ec=1175/1168 lis/c=512353/472766 les/c/f=512354/472770/232522
   sis=512356 pruub=11.945847511s)
   [259,210,390,209,43,66,322,297,25,374]p259(0) r=0 lpr=512356
   pi=[472766,512356)/10 crt=512310'8145216 mlcod 0'0 unknown pruub
   227.741577148s@ m=1 mbc={}] state<Start>: transitioning to Primary
   2024-11-25T11:38:59.910+0000 7fd409870700  0 osd.259 pg_epoch:
   512359 pg[2.c90s0( v 512310'8145216 lc 0'0
   (511405'8142151,512310'8145216] local-lis/les=512356/512357 n=211842
   ec=1175/1168 lis/c=512356/472766 les/c/f=512357/472770/232522
   sis=512356) [259,210,390,209,43,66,322,297,25,374]p259(0) r=0
   lpr=512356 pi=[472766,512356)/10 crt=512310'8145216 mlcod 0'0
   active+recovering+degraded rops=1 m=1
   mbc={0={(0+0)=1},1={(1+0)=1},2={(1+0)=1},3={(1+0)=1},4={(1+0)=1},5={(1+0)=1},6={(1+0)=1},7={(1+0)=1},8={(1+0)=1},9={(1+0)=1}}
   trimq=[13f6e~134]] get_remaining_shards not enough shards left to
   try for 2:0930c16c:::1009e1df26d.000000c9:head read result was
   read_result_t(r=0,
   errors={25(8)=-2,43(4)=-2,66(5)=-2,209(3)=-2,210(1)=-2,297(7)=-2,322(6)=-2,390(2)=-2},
   noattrs, returned=(0, 8388608, []))
   2024-11-25T11:38:59.911+0000 7fd409870700  0 osd.259 pg_epoch:
   512359 pg[2.c90s0( v 512310'8145216 lc 0'0
   (511405'8142151,512310'8145216] local-lis/les=512356/512357 n=211842
   ec=1175/1168 lis/c=512356/472766 les/c/f=512357/472770/232522
   sis=512356) [259,210,390,209,43,66,322,297,25,374]p259(0) r=0
   lpr=512356 pi=[472766,512356)/10 crt=512310'8145216 mlcod 0'0
   active+recovering+degraded rops=1 m=1 u=1
   mbc={0={(0+0)=1},1={(0+0)=1},2={(0+0)=1},3={(0+0)=1},4={(0+0)=1},5={(0+0)=1},6={(0+0)=1},7={(0+0)=1},8={(0+0)=1},9={(1+0)=1}}
   trimq=[13f6e~134]] on_failed_pull
   2:0930c16c:::1009e1df26d.000000c9:head from shard
   25(8),43(4),66(5),209(3),210(1),297(7),322(6),390(2), reps on 374(9)
   unfound? 1
   2024-11-25T11:39:01.435+0000 7fd409870700 -1 log_channel(cluster)
   log [ERR] : 2.c90 has 1 objects unfound and apparently lost
   2024-11-25T11:39:02.193+0000 7fd409870700 -1 log_channel(cluster)
   log [ERR] : 2.c90 has 1 objects unfound and apparently lost
   2024-11-25T11:39:03.590+0000 7fd409870700  1 osd.259 pg_epoch:
   512362 pg[2.c90s0( v 512310'8145216 lc 0'0
   (511405'8142151,512310'8145216] local-lis/les=512356/512357 n=211842
   ec=1175/1168 lis/c=512356/472766 les/c/f=512357/472770/232522
   sis=512362 pruub=8.352689743s)
   [259,210,390,209,43,66,322,297,25,NONE]p259(0) r=0 lpr=512362
   pi=[472766,512362)/11 crt=512310'8145216 mlcod 0'0 active pruub
   232.812744141s@ m=1 u=1
   mbc={0={(0+0)=1},1={(0+0)=1},2={(0+0)=1},3={(0+0)=1},4={(0+0)=1},5={(0+0)=1},6={(0+0)=1},7={(0+0)=1},8={(0+0)=1},9={(1+0)=1}}]
   start_peering_interval up [259,210,390,209,43,66,322,297,25,374] ->
   [259,210,390,209,43,66,322,297,25,2147483647], acting
   [259,210,390,209,43,66,322,297,25,374] ->
   [259,210,390,209,43,66,322,297,25,2147483647], acting_primary 259(0)
   -> 259, up_primary 259(0) -> 259, role 0 -> 0, features acting
   4540138320759226367 upacting 4540138320759226367
   2024-11-25T11:39:03.591+0000 7fd409870700  1 osd.259 pg_epoch:
   512362 pg[2.c90s0( v 512310'8145216 lc 0'0
   (511405'8142151,512310'8145216] local-lis/les=512356/512357 n=211842
   ec=1175/1168 lis/c=512356/472766 les/c/f=512357/472770/232522
   sis=512362 pruub=8.352689743s)
   [259,210,390,209,43,66,322,297,25,NONE]p259(0) r=0 lpr=512362
   pi=[472766,512362)/11 crt=512310'8145216 mlcod 0'0 unknown pruub
   232.812744141s@ m=1 mbc={}] state<Start>: transitioning to Primary
   2024-11-25T11:39:24.954+0000 7fd409870700  1 osd.259 pg_epoch:
   512365 pg[2.c90s0( v 512310'8145216 lc 0'0
   (511405'8142151,512310'8145216] local-lis/les=512362/512363 n=211842
   ec=1175/1168 lis/c=512362/472766 les/c/f=512363/472770/232522
   sis=512365 pruub=11.731550217s)
   [259,210,390,209,43,66,322,297,25,374]p259(0) r=0 lpr=512365
   pi=[472766,512365)/10 crt=512310'8145216 mlcod 0'0 active pruub
   257.554870605s@ m=1
   mbc={0={(0+0)=1},1={(1+0)=1},2={(1+0)=1},3={(1+0)=1},4={(1+0)=1},5={(1+0)=1},6={(1+0)=1},7={(1+0)=1},8={(1+0)=1},9={(0+0)=1}}]
   start_peering_interval up
   [259,210,390,209,43,66,322,297,25,2147483647] ->
   [259,210,390,209,43,66,322,297,25,374], acting
   [259,210,390,209,43,66,322,297,25,2147483647] ->
   [259,210,390,209,43,66,322,297,25,374], acting_primary 259(0) ->
   259, up_primary 259(0) -> 259, role 0 -> 0, features acting
   4540138320759226367 upacting 4540138320759226367
   2024-11-25T11:39:24.954+0000 7fd409870700  1 osd.259 pg_epoch:
   512365 pg[2.c90s0( v 512310'8145216 lc 0'0
   (511405'8142151,512310'8145216] local-lis/les=512362/512363 n=211842
   ec=1175/1168 lis/c=512362/472766 les/c/f=512363/472770/232522
   sis=512365 pruub=11.731550217s)
   [259,210,390,209,43,66,322,297,25,374]p259(0) r=0 lpr=512365
   pi=[472766,512365)/10 crt=512310'8145216 mlcod 0'0 unknown pruub
   257.554870605s@ m=1 mbc={}] state<Start>: transitioning to Primary
   2024-11-25T11:39:30.679+0000 7fd409870700  0 osd.259 pg_epoch:
   512368 pg[2.c90s0( v 512310'8145216 lc 0'0
   (511405'8142151,512310'8145216] local-lis/les=512365/512366 n=211842
   ec=1175/1168 lis/c=512365/472766 les/c/f=512366/472770/232522
   sis=512365) [259,210,390,209,43,66,322,297,25,374]p259(0) r=0
   lpr=512365 pi=[472766,512365)/10 crt=512310'8145216 mlcod 0'0
   active+recovering+degraded rops=1 m=1
   mbc={0={(0+0)=1},1={(1+0)=1},2={(1+0)=1},3={(1+0)=1},4={(1+0)=1},5={(1+0)=1},6={(1+0)=1},7={(1+0)=1},8={(1+0)=1},9={(1+0)=1}}
   trimq=[13f6e~134]] get_remaining_shards not enough shards left to
   try for 2:0930c16c:::1009e1df26d.000000c9:head read result was
   read_result_t(r=0,
   errors={25(8)=-2,43(4)=-2,66(5)=-2,209(3)=-2,210(1)=-2,297(7)=-2,322(6)=-2,390(2)=-2},
   noattrs, returned=(0, 8388608, []))
   2024-11-25T11:39:30.679+0000 7fd409870700  0 osd.259 pg_epoch:
   512368 pg[2.c90s0( v 512310'8145216 lc 0'0
   (511405'8142151,512310'8145216] local-lis/les=512365/512366 n=211842
   ec=1175/1168 lis/c=512365/472766 les/c/f=512366/472770/232522
   sis=512365) [259,210,390,209,43,66,322,297,25,374]p259(0) r=0
   lpr=512365 pi=[472766,512365)/10 crt=512310'8145216 mlcod 0'0
   active+recovering+degraded rops=1 m=1 u=1
   mbc={0={(0+0)=1},1={(0+0)=1},2={(0+0)=1},3={(0+0)=1},4={(0+0)=1},5={(0+0)=1},6={(0+0)=1},7={(0+0)=1},8={(0+0)=1},9={(1+0)=1}}
   trimq=[13f6e~134]] on_failed_pull
   2:0930c16c:::1009e1df26d.000000c9:head from shard
   25(8),43(4),66(5),209(3),210(1),297(7),322(6),390(2), reps on 374(9)
   unfound? 1
   2024-11-25T11:40:11.652+0000 7fd409870700 -1 log_channel(cluster)
   log [ERR] : 2.c90 has 1 objects unfound and apparently lost
   [root@ceph-n30 ~]# rados -p ec82pool stat 1009e1df26d.000000c9

   ... # hanging indefinitely

Similarly if we do any object store commands, then we get crashes and 
seg faults as well:

   [root@ceph-n25 ~]# ceph pg 2.c90 list_unfound
   {
        "num_missing": 1,
        "num_unfound": 1,
        "objects": [
            {
                "oid": {
                    "oid": "1009e1df26d.000000c9",
                    "key": "",
                    "snapid": -2,
                    "hash": 914558096,
                    "max": 0,
                    "pool": 2,
                    "namespace": ""
                },
                "need": "510502'8140206",
                "have": "0'0",
                "flags": "none",
                "clean_regions": "clean_offsets: [], clean_omap: 0,
   new_object: 1",
                "locations": [
                    "374(9)"
                ]
            }
        ],
        "state": "NotRecovering",
        "available_might_have_unfound": true,
        "might_have_unfound": [],
        "more": false
   }
   [root@ceph-n11 ~]# ceph-objectstore-tool --data-path
   /var/lib/ceph/osd/ceph-374 --debug --pgid 2.c90 1009e1df26d.000000c9
   dump
   ...
        -6> 2024-11-25T15:54:57.387+0000 7f8b6a0a7340  1
   bluestore(/var/lib/ceph/osd/ceph-374) _upgrade_super from 4, latest 4

        -5> 2024-11-25T15:54:57.387+0000 7f8b6a0a7340  1
   bluestore(/var/lib/ceph/osd/ceph-374) _upgrade_super done

        -4> 2024-11-25T15:54:57.439+0000 7f8b5d925700  5 prioritycache
   tune_memory target: 4294967296 mapped: 132694016 unmapped: 618364928
   heap: 751058944 old mem: 134217728 new mem: 2761652735

        -3> 2024-11-25T15:54:57.439+0000 7f8b5d925700  5 rocksdb:
   commit_cache_size High Pri Pool Ratio set to 0.0540541

        -2> 2024-11-25T15:54:57.439+0000 7f8b5d925700  5 prioritycache
   tune_memory target: 4294967296 mapped: 132939776 unmapped: 618119168
   heap: 751058944 old mem: 2761652735 new mem: 2842823159

        -1> 2024-11-25T15:54:57.439+0000 7f8b5d925700  5
   bluestore.MempoolThread(0x55cd3a807b40) _resize_shards cache_size:
   2842823159 kv_alloc: 1241513984 kv_used: 2567024 kv_onode_alloc:
   42949672 kv_onode_used: -22 meta_alloc: 1174405120 meta_used: 13360
   data_alloc: 218103808 data_u
   sed: 0

         0> 2024-11-25T15:54:57.445+0000 7f8b6a0a7340 -1 *** Caught
   signal (Segmentation fault) **
     in thread 7f8b6a0a7340 thread_name:ceph-objectstor

     ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2)
   quincy (stable)
     1: /lib64/libpthread.so.0(+0x12cf0) [0x7f8b67941cf0]
     2:
   (BlueStore::collection_list(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
   ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t,
   std::allocator<ghobject_t> >*, ghobject_t*)+0x4c) [0x55cd37e34f5c]
     3: (_action_on_all_objects_in_pg(ObjectStore*, coll_t,
   action_on_object_t&, bool)+0x13b4) [0x55cd377fdf64]
     4: (action_on_all_objects_in_exact_pg(ObjectStore*, coll_t,
   action_on_object_t&, bool)+0x64) [0x55cd377fe274]
     5: main()
     6: __libc_start_main()
     7: _start()
     NOTE: a copy of the executable, or `objdump -rdS <executable>` is
   needed to interpret this.
   We don't have a previous verison of this object and trying to
   fix-lost with the object store command seg faults:

   [root@ceph-n11 ~]  ceph-objectstore-tool --data-path
   /var/lib/ceph/osd/ceph-374  --pgid 2.c90 --op fix-lost --dry-run
   *** Caught signal (Segmentation fault) **
     in thread 7f45d9890340 thread_name:ceph-objectstor
     ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2)
   quincy (stable)
     1: /lib64/libpthread.so.0(+0x12cf0) [0x7f45d712acf0]
     2:
   (BlueStore::collection_list(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
   ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t,
   std::allocator<ghobject_t> >*, ghobject_t*)+0x4c) [0x5556f3c36f5c]
     3: (_action_on_all_objects_in_pg(ObjectStore*, coll_t,
   action_on_object_t&, bool)+0x13b4) [0x5556f35fff64]
     4: (action_on_all_objects_in_exact_pg(ObjectStore*, coll_t,
   action_on_object_t&, bool)+0x64) [0x5556f3600274]
     5: main()
     6: __libc_start_main()
     7: _start()
   Segmentation fault (core dumped)

The recommended solution seems to be to use "mark_unfound_lost revert" 
but for our object, there is no previous version so I think this command 
will discard and delete the object. The lost object is on our live 
filesystem and there seems to be no easily way to find the backup 
version as I can't access the path and or filename associated with the 
object to recover it from the backup. Is there any way for us to recover 
this object without discarding it? Or should we just accept our losses 
and delete it?

Kindest regards,

Ivan Clayson

--
Ivan Clayson
-----------------
Scientific Computing Officer
Room 2N269
Structural Studies
MRC Laboratory of Molecular Biology
Francis Crick Ave, Cambridge
CB2 0QH
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx