Re: Missing objects in pg

Vadim Bulst <vadim.bulst@xxxxxxxxxxxxxx> · Fri, 25 Jun 2021 11:39:46 +0200



    Hello Cephers,
    it is a mystery. My cluster is out of error state. How - don't
      really know. I initiated deep scrubbing for affected pgs
      yesterday. Maybe that was fixing it.
    Cheers,
    Vadim

    
    On 6/24/21 1:15 PM, Vadim Bulst wrote:

    
    Dear
      List,
      

      since my update yesterday from 14.2.18 to 14.2.20 i got an
      unhealthy cluster. As I remember right, it appeared after
      rebooting the second server. They are 7 missing objects from pgs
      of a cache pool (pool 3). This pool is now changed writeback to
      proxy and i'm not able to flush all objects.
      

      root@scvirt06:/home/urzadmin/ceph_issue# ceph -s
      

        cluster:
      

          id:     5349724e-fa96-4fd6-8e44-8da2a39253f7
      

          health: HEALTH_ERR
      

                  7/15893342 objects unfound (0.000%)
      

                  Possible data damage: 7 pgs recovery_unfound
      

                  Degraded data redundancy: 21/47680026 objects degraded
      (0.000%), 7 pgs degraded, 7 pgs undersized
      

                  client is using insecure global_id reclaim
      

                  mons are allowing insecure global_id reclaim
      

        services:
      

          mon: 3 daemons, quorum scvirt03,scvirt06,scvirt01 (age 19h)
      

          mgr: scvirt04(active, since 21m), standbys: scvirt03, scvirt02
      

          mds: scfs:1 {0=scvirt04=up:active} 1 up:standby-replay 1
      up:standby
      

          osd: 54 osds: 54 up (since 17m), 54 in (since 10w); 7 remapped
      pgs
      

        task status:
      

          scrub status:
      

              mds.scvirt03: idle
      

        data:
      

          pools:   5 pools, 704 pgs
      

          objects: 15.89M objects, 49 TiB
      

          usage:   139 TiB used, 145 TiB / 285 TiB avail
      

          pgs:     21/47680026 objects degraded (0.000%)
      

                   7/15893342 objects unfound (0.000%)
      

                   694 active+clean
      

                   7
      active+recovery_unfound+undersized+degraded+remapped
      

                   3   active+clean+scrubbing+deep
      

        io:
      

          client:   3.7 MiB/s rd, 6.6 MiB/s wr, 40 op/s rd, 31 op/s wr
      

      my cluster:
      

      scvirt01 - mon,osds
      

      scvirt02 - mgr,osds
      

      scvirt03 - mon,mgr,mds,osds
      

      scvirt04 - mgr,mds,osds
      

      scvirt05 - osds
      

      scvirt06 - mon,mds,osds
      

      log of osd.49:
      

      root@scvirt03:/home/urzadmin# tail -f
      /var/log/ceph/ceph-osd.49.log
      

      AddFile(GB): cumulative 0.000, interval 0.000
      

      AddFile(Total Files): cumulative 0, interval 0
      

      AddFile(L0 Files): cumulative 0, interval 0
      

      AddFile(Keys): cumulative 0, interval 0
      

      Cumulative compaction: 0.64 GB write, 0.01 MB/s write, 0.54 GB
      read, 0.01 MB/s read, 6.5 seconds Interval compaction: 0.00 GB
      write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
      Stalls(count): 0 level0_slowdown, 0
      level0_slowdown_with_compaction, 0 level0_numfiles, 0
      level0_numfiles_with_compaction, 0 stop for
      pending_compaction_bytes, 0 slowdown for pending_compaction_bytes,
      0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
      

      ** File Read Latency Histogram By Level [default] **
      

      2021-06-24 08:53:08.865 7f88ab86c700 -1 log_channel(cluster) log
      [ERR] : 3.9 has 1 objects unfound and apparently lost
      

      2021-06-24 08:53:08.865 7f88a505f700 -1 log_channel(cluster) log
      [ERR] : 3.1e has 1 objects unfound and apparently lost
      

      2021-06-24 08:53:40.570 7f88ab86c700 -1 log_channel(cluster) log
      [ERR] : 3.9 has 1 objects unfound and apparently lost
      

      2021-06-24 08:53:40.570 7f88a9067700 -1 log_channel(cluster) log
      [ERR] : 3.1e has 1 objects unfound and apparently lost
      

      2021-06-24 08:54:45.042 7f88b487e700  4 rocksdb:
      [db/db_impl.cc:777] ------- DUMPING STATS -------
      

      2021-06-24 08:54:45.042 7f88b487e700  4 rocksdb:
      [db/db_impl.cc:778]
      

      ** DB Stats **
      

      Uptime(secs): 85202.3 total, 600.0 interval
      

      Cumulative writes: 1148K writes, 8640K keys, 1148K commit groups,
      1.0 writes per commit group, ingest: 1.24 GB, 0.01 MB/s
      

      Cumulative WAL: 1148K writes, 546K syncs, 2.10 writes per sync,
      written: 1.24 GB, 0.01 MB/s
      

      Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent
      

      Interval writes: 369 writes, 1758 keys, 369 commit groups, 1.0
      writes per commit group, ingest: 0.41 MB, 0.00 MB/s
      

      Interval WAL: 369 writes, 155 syncs, 2.37 writes per sync,
      written: 0.00 MB, 0.00 MB/s
      

      Interval stall: 00:00:0.000 H:M:S, 0.0 percent
      

      ** Compaction Stats [default] **
      

      Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB)
      Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec)
      CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
      

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      

        L0      3/0   104.40 MB   0.8      0.0     0.0      0.0 0.2
      0.2       0.0   1.0      0.0     67.8 2.89              2.70 6   
      0.482       0      0
      

        L1      2/0   131.98 MB   0.5      0.2     0.1      0.1 0.2
      0.1       0.0   1.8    149.9    120.9 1.53              1.41 1   
      1.527   2293K   140K
      

        L2     16/0   871.57 MB   0.3      0.3     0.1      0.3 0.3
      -0.0       0.0   5.2    158.1    132.3 2.05 1.93         1   
      2.052   3997K  1089K
      

       Sum     21/0    1.08 GB   0.0      0.5     0.2      0.4 0.6
      0.2       0.0   3.3     85.5    100.8 6.47              6.03 8   
      0.809   6290K  1229K
      

       Int      0/0    0.00 KB   0.0      0.0     0.0      0.0 0.0
      0.0       0.0   0.0      0.0      0.0 0.00              0.00 0   
      0.000       0      0
      

      If I run
      

      ceph pg repair 3.1e
      

      it doesn't change anything
      

      and i do not understand why these pgs are undersized. All OSDs are
      up.
      

      ceph.conf:
      

      [global]
      

               auth_client_required = cephx
      

               auth_cluster_required = cephx
      

               auth_service_required = cephx
      

               cluster_network = 10.10.144.0/24
      

               filestore_xattr_use_omap = true
      

               fsid = 5349724e-fa96-4fd6-8e44-8da2a39253f7
      

               mon_allow_pool_delete = true
      

               mon_cluster_log_file_level = info
      

               mon_host = 172.26.8.151,172.26.8.153,172.26.8.156
      

               osd_journal_size = 5120
      

               osd_pool_default_min_size = 1
      

               public_network = 172.26.8.128/26
      

      [client]
      

               keyring = /etc/pve/priv/$cluster.$name.keyring
      

      [mds]
      

               keyring = /var/lib/ceph/mds/ceph-$id/keyring
      

      [mds.scvirt03]
      

               host = scvirt03
      

               mds_standby_for_rank = 0
      

               mds_standby_replay = true
      

      [mds.scvirt04]
      

               host = scvirt04
      

               mds standby for name = pve
      

      [mds.scvirt06]
      

               host = scvirt06
      

               mds_standby_for_rank = 0
      

               mds_standby_replay = true
      

      [mon.scvirt01]
      

               public_addr = 172.26.8.151
      

      [mon.scvirt03]
      

               public_addr = 172.26.8.153
      

      [mon.scvirt06]
      

               public_addr = 172.26.8.156
      

      ceph health detail:
      

      HEALTH_ERR 7/15893333 objects unfound (0.000%); Possible data
      damage: 7 pgs recovery_unfound; Degraded data redundancy:
      21/47679999 objects degraded (0.000%), 7 pgs degraded, 7 pgs
      undersized; client is using insecure global_id reclaim; mons are
      allowing insecure global_id reclaim
      

      OBJECT_UNFOUND 7/15893333 objects unfound (0.000%)
      

          pg 3.1e has 1 unfound objects
      

          pg 3.1f has 1 unfound objects
      

          pg 3.1b has 1 unfound objects
      

          pg 3.15 has 1 unfound objects
      

          pg 3.16 has 1 unfound objects
      

          pg 3.b has 1 unfound objects
      

          pg 3.9 has 1 unfound objects
      

      PG_DAMAGED Possible data damage: 7 pgs recovery_unfound
      

          pg 3.9 is
      active+recovery_unfound+undersized+degraded+remapped, acting
      [49,52], 1 unfound
      

          pg 3.b is
      active+recovery_unfound+undersized+degraded+remapped, acting
      [43,52], 1 unfound
      

          pg 3.15 is
      active+recovery_unfound+undersized+degraded+remapped, acting
      [44,52], 1 unfound
      

          pg 3.16 is
      active+recovery_unfound+undersized+degraded+remapped, acting
      [43,51], 1 unfound
      

          pg 3.1b is
      active+recovery_unfound+undersized+degraded+remapped, acting
      [43,52], 1 unfound
      

          pg 3.1e is
      active+recovery_unfound+undersized+degraded+remapped, acting
      [49,51], 1 unfound
      

          pg 3.1f is
      active+recovery_unfound+undersized+degraded+remapped, acting
      [48,51], 1 unfound
      

      PG_DEGRADED Degraded data redundancy: 21/47679999 objects degraded
      (0.000%), 7 pgs degraded, 7 pgs undersized
      

          pg 3.9 is stuck undersized for 64516.343966, current state
      active+recovery_unfound+undersized+degraded+remapped, last acting
      [49,52]
      

          pg 3.b is stuck undersized for 64516.351507, current state
      active+recovery_unfound+undersized+degraded+remapped, last acting
      [43,52]
      

          pg 3.15 is stuck undersized for 64521.368841, current state
      active+recovery_unfound+undersized+degraded+remapped, last acting
      [44,52]
      

          pg 3.16 is stuck undersized for 64516.351599, current state
      active+recovery_unfound+undersized+degraded+remapped, last acting
      [43,51]
      

          pg 3.1b is stuck undersized for 64517.427120, current state
      active+recovery_unfound+undersized+degraded+remapped, last acting
      [43,52]
      

          pg 3.1e is stuck undersized for 64521.369635, current state
      active+recovery_unfound+undersized+degraded+remapped, last acting
      [49,51]
      

          pg 3.1f is stuck undersized for 64517.426392, current state
      active+recovery_unfound+undersized+degraded+remapped, last acting
      [48,51]
      

      AUTH_INSECURE_GLOBAL_ID_RECLAIM client is using insecure global_id
      reclaim
      

          client.admin at 172.26.8.154:0/3925203408 is using insecure
      global_id reclaim
      

          mds.scvirt04 at
      [v2:172.26.8.154:6836/3778505565,v1:172.26.8.154:6837/3778505565]
      is using insecure global_id reclaim
      

      AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED mons are allowing insecure
      global_id reclaim
      

          mon.scvirt03 has auth_allow_insecure_global_id_reclaim set to
      true
      

          mon.scvirt06 has auth_allow_insecure_global_id_reclaim set to
      true
      

          mon.scvirt01 has auth_allow_insecure_global_id_reclaim set to
      true
      

      ceph osd tree:
      

      ID  CLASS WEIGHT    TYPE NAME         STATUS REWEIGHT PRI-AFF
      

       -1       284.51312 root default
      

       -2        48.75215     host scvirt01
      

        0   hdd   9.09560         osd.0         up  1.00000 1.00000
      

        3   hdd   9.09560         osd.3         up  1.00000 1.00000
      

        6   hdd   9.09560         osd.6         up  1.00000 1.00000
      

        9   hdd   9.09560         osd.9         up  1.00000 1.00000
      

       12   hdd   9.09560         osd.12        up  1.00000 1.00000
      

       42  nvme   0.97029         osd.42        up  1.00000 1.00000
      

       43  nvme   0.97029         osd.43        up  1.00000 1.00000
      

       44  nvme   0.97029         osd.44        up  1.00000 1.00000
      

       37   ssd   0.36330         osd.37        up  1.00000 1.00000
      

       -3        48.75215     host scvirt02
      

        1   hdd   9.09560         osd.1         up  1.00000 1.00000
      

        4   hdd   9.09560         osd.4         up  1.00000 1.00000
      

        7   hdd   9.09560         osd.7         up  1.00000 1.00000
      

       10   hdd   9.09560         osd.10        up  1.00000 1.00000
      

       13   hdd   9.09560         osd.13        up  1.00000 1.00000
      

       45  nvme   0.97029         osd.45        up  1.00000 1.00000
      

       46  nvme   0.97029         osd.46        up  1.00000 1.00000
      

       47  nvme   0.97029         osd.47        up  1.00000 1.00000
      

       38   ssd   0.36330         osd.38        up  1.00000 1.00000
      

       -4        48.75224     host scvirt03
      

        2   hdd   9.09569         osd.2         up  1.00000 1.00000
      

        5   hdd   9.09560         osd.5         up  1.00000 1.00000
      

        8   hdd   9.09560         osd.8         up  1.00000 1.00000
      

       11   hdd   9.09560         osd.11        up  1.00000 1.00000
      

       14   hdd   9.09560         osd.14        up  1.00000 1.00000
      

       48  nvme   0.97029         osd.48        up  1.00000 1.00000
      

       49  nvme   0.97029         osd.49        up  1.00000 1.00000
      

       50  nvme   0.97029         osd.50        up  1.00000 1.00000
      

       39   ssd   0.36330         osd.39        up  1.00000 1.00000
      

       -9        56.75706     host scvirt04
      

       15   hdd   9.09560         osd.15        up  1.00000 1.00000
      

       17   hdd   9.09560         osd.17        up  1.00000 1.00000
      

       20   hdd   9.09560         osd.20        up  1.00000 1.00000
      

       22   hdd   9.09560         osd.22        up  1.00000 1.00000
      

       23   hdd   9.09560         osd.23        up  1.00000 1.00000
      

       25   hdd   3.63860         osd.25        up  1.00000 1.00000
      

       26   hdd   3.63860         osd.26        up  1.00000 1.00000
      

       27   hdd   3.63860         osd.27        up  1.00000 1.00000
      

       40   ssd   0.36330         osd.40        up  1.00000 1.00000
      

      -11        56.75706     host scvirt05
      

       16   hdd   9.09560         osd.16        up  1.00000 1.00000
      

       18   hdd   9.09560         osd.18        up  1.00000 1.00000
      

       19   hdd   9.09560         osd.19        up  1.00000 1.00000
      

       21   hdd   9.09560         osd.21        up  1.00000 1.00000
      

       24   hdd   9.09560         osd.24        up  1.00000 1.00000
      

       28   hdd   3.63860         osd.28        up  1.00000 1.00000
      

       29   hdd   3.63860         osd.29        up  1.00000 1.00000
      

       30   hdd   3.63860         osd.30        up  1.00000 1.00000
      

       41   ssd   0.36330         osd.41        up  1.00000 1.00000
      

      -13        24.74245     host scvirt06
      

       31   hdd   3.63860         osd.31        up  1.00000 1.00000
      

       32   hdd   3.63860         osd.32        up  1.00000 1.00000
      

       33   hdd   3.63860         osd.33        up  1.00000 1.00000
      

       34   hdd   3.63860         osd.34        up  1.00000 1.00000
      

       35   hdd   3.63860         osd.35        up  1.00000 1.00000
      

       36   hdd   3.63860         osd.36        up  1.00000 1.00000
      

       51  nvme   0.97029         osd.51        up  1.00000 1.00000
      

       52  nvme   0.97029         osd.52        up  1.00000 1.00000
      

       53  nvme   0.97029         osd.53        up  1.00000 1.00000
      

      Regards,
      

      Vadim
      

      _______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

    
    -- 
Vadim Bulst

Universität Leipzig / URZ
04109  Leipzig, Augustusplatz 10

phone:   +49-341-97-33380
mail:    vadim.bulst@xxxxxxxxxxxxxx
  

Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx