Re: [EXTERNAL] How can I fix "object unfound" error?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Can you share "ceph pg 6.36a query" output

Steve


On 3/2/20, 2:53 AM, "Simone Lazzaris" <simone.lazzaris@xxxxxxx> wrote:

    Hi there;
    I've got a ceph cluster with 4 nodes, each with 9 4TB drives.
    Last night a disk failed, and unfortunately this lead to a kernel panic on the hosting server 
    (supermicro: never again).
    One reboot later, the cluster rebalances.
    
    This morning, I'm in this situation:
    
    root@s3:~# ceph status
      cluster:
        id:     9ec27b0f-acfd-40a3-b35d-db301ac5ce8c
        health: HEALTH_ERR
                1/13122293 objects unfound (0.000%)
                Possible data damage: 1 pg backfill_unfound
                Degraded data redundancy: 1 pg undersized
                27 slow ops, oldest one blocked for 68 sec, osd.5 has slow ops
     
      services:
        mon: 3 daemons, quorum s1,s2,s3 (age 11h)
        mgr: s1(active, since 6w), standbys: s2, s3
        osd: 36 osds: 35 up (since 11h), 35 in (since 11h); 21 remapped pgs
        rgw: 3 daemons active (s1, s2, s3)
     
      data:
        pools:   10 pools, 1200 pgs
        objects: 13.12M objects, 41 TiB
        usage:   63 TiB used, 65 TiB / 127 TiB avail
        pgs:     186357/39366879 objects misplaced (0.473%)
                 1/13122293 objects unfound (0.000%)
                 1179 active+clean
                 11   active+remapped+backfilling
                 9    active+remapped+backfill_wait
                 1    active+backfill_unfound+undersized+remapped
     
      io:
        client:   42 KiB/s rd, 5.2 MiB/s wr, 43 op/s rd, 11 op/s wr
        recovery: 163 MiB/s, 48 objects/s
    
    
    One PG is in "backfill_unfound" status. The PG is the 6.36a, which is on server 1; the failed disk 
    is the OSD.5, on server 3 (which was rebooted after the panic) so I don't understand the 
    relation.
    
    This is the unfound object:
    root@s3:~# ceph pg 6.36a list_unfound
    {
        "num_missing": 1,
        "num_unfound": 1,
        "objects": [
            {
                "oid": {
                    "oid": "8a257939-05c9-4ba8-9fd3-fb8504226607.4332.4__shadow_.H5AtB0LjzRSbUWy-
    hnVSLf4fs884okG_1",
                    "key": "",
                    "snapid": -2,
                    "hash": 961006442,
                    "max": 0,
                    "pool": 6,
                    "namespace": ""
                },
                "need": "263'18213",
                "have": "0'0",
                "flags": "none",
                "locations": []
            }
        ],
        "more": false
    }
    
    
    How can I handle this error? The docs are not much comforting, as far as I can see the only 
    thing to do is to mark the missing object as lost and try to cope with that. I'd prefer not.
    
    Any ideas?
    
    
    *Simone Lazzaris*
    *Qcom S.p.A.*
    simone.lazzaris@xxxxxxx[1] | www.qcom.it[2]
    * LinkedIn[3]* | *Facebook*[4]
    
    
    
    --------
    [1] mailto:simone.lazzaris@xxxxxxx
    [2] https://www.qcom.it
    [3] https://www.linkedin.com/company/qcom-spa
    [4] http://www.facebook.com/qcomspa
    _______________________________________________
    ceph-users mailing list -- ceph-users@xxxxxxx
    To unsubscribe send an email to ceph-users-leave@xxxxxxx
    

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux