Re: Random health OSD_SCRUB_ERRORS on various OSDs, after pg repair back to HEALTH_OK

"Marco Baldini - H.S. Amiata" <mbaldini@xxxxxxxxxxx> · Mon, 5 Mar 2018 12:21:52 +0100



    Hi

    
    After some days with debug_osd 5/5 I found [ERR] in different
      days, different PGs, different OSDs, different hosts. This is what
      I get in the OSD logs:
    OSD.5 (host 3)
2018-03-01 20:30:02.702269 7fdf4d515700  2 osd.5 pg_epoch: 16486 pg[9.1c( v 16486'51798 (16431'50251,16486'51798] local-lis/les=16474/16475 n=3629 ec=1477/1477 lis/c 16474/16474 les/c/f 16475/16477/0 16474/16474/16474) [5,6] r=0 lpr=16474 crt=16486'51798 lcod 16486'51797 mlcod 16486'51797 active+clean+scrubbing+deep] 9.1c shard 6: soid 9:3b157c56:::rbd_data.1526386b8b4567.0000000000001761:head candidate had a read error
2018-03-01 20:30:02.702278 7fdf4d515700 -1 log_channel(cluster) log [ERR] : 9.1c shard 6: soid 9:3b157c56:::rbd_data.1526386b8b4567.0000000000001761:head candidate had a read error
    
OSD.4 (host 3)
2018-02-28 00:03:33.458558 7f112cf76700 -1 log_channel(cluster) log [ERR] : 13.65 shard 2: soid 13:a719ecdf:::rbd_data.5f65056b8b4567.000000000000f8eb:head candidate had a read error
    
    OSD.8 (host 2)
2018-02-27 23:55:15.100084 7f4dd0816700 -1 log_channel(cluster) log [ERR] : 14.31 shard 1: soid 14:8cc6cd37:::rbd_data.30b15b6b8b4567.00000000000081a1:head candidate had a read error
    
    
    I don't know what this error is meaning, and as always a ceph pg
      repair fixes it. I don't think this is normal.
    Ideas?
    Thanks

    
    Il 28/02/2018 14:48, Marco Baldini -
      H.S. Amiata ha scritto:

    
      Hi
      I read the bugtracker issue and it seems a lot like my problem,
        even if I can't check the reported checksum because I don't have
        it in my logs, perhaps it's because of debug osd = 0/0 in
        ceph.conf
      I just raised the OSD log level
      ceph tell osd.* injectargs --debug-osd 5/5
      I'll check OSD logs in the next days...
      Thanks 

      
      Il 28/02/2018 11:59, Paul Emmerich ha
        scritto:

      
        Hi,
        

        might be http://tracker.ceph.com/issues/22464
        

        Can you check the OSD log file to see if the
          reported checksum is 0x6706be76?
        

        Paul
        

              Am 28.02.2018 um 11:43 schrieb Marco Baldini
                - H.S. Amiata <mbaldini@xxxxxxxxxxx>:
              

                  Hello
                  I have a little ceph cluster with 3 nodes,
                    each with 3x1TB HDD and 1x240GB SSD. I created this
                    cluster after Luminous release, so all OSDs are
                    Bluestore. In my crush map I have two rules, one
                    targeting the SSDs and one targeting the HDDs. I
                    have 4 pools, one using the SSD rule and the others
                    using the HDD rule, three pools are size=3
                    min_size=2, one is size=2 min_size=1 (this one have
                    content that it's ok to lose)
                  In the last 3 month I'm having a strange
                    random problem. I planned my osd scrubs during the
                    night (osd scrub begin hour = 20, osd scrub end hour
                    = 7) when office is closed so there is low impact on
                    the users. Some mornings, when I ceph the cluster
                    health, I find: 

                  
                  HEALTH_ERR X scrub errors; Possible data damage: Y pgs inconsistent
OSD_SCRUB_ERRORS X scrub errors
PG_DAMAGED Possible data damage: Y pg inconsistent
                  X and Y sometimes are 1, sometimes 2.
                  I issue a ceph health detail, check the
                    damaged PGs, and run a ceph pg repair for the
                    damaged PGs, I get
                  instructing pg PG on osd.N to repair
                  PG are different, OSD that have to repair
                    PG is different, even the node hosting the OSD is
                    different, I made a list of all PGs and OSDs. This
                    morning is the most recent case:
                  > ceph health detail
HEALTH_ERR 2 scrub errors; Possible data damage: 2 pgs inconsistent
OSD_SCRUB_ERRORS 2 scrub errors
PG_DAMAGED Possible data damage: 2 pgs inconsistent
pg 13.65 is active+clean+inconsistent, acting [4,2,6]
pg 14.31 is active+clean+inconsistent, acting [8,3,1]

                  > ceph pg repair 13.65
instructing pg 13.65 on osd.4 to repair

(node-2)> tail /var/log/ceph/ceph-osd.4.log
2018-02-28 08:38:47.593447 7f112cf76700  0 log_channel(cluster) log [DBG] : 13.65 repair starts
2018-02-28 08:39:37.573342 7f112cf76700  0 log_channel(cluster) log [DBG] : 13.65 repair ok, 0 fixed
                  > ceph pg repair 14.31
instructing pg 14.31 on osd.8 to repair

(node-3)> tail /var/log/ceph/ceph-osd.8.log
2018-02-28 08:52:37.297490 7f4dd0816700  0 log_channel(cluster) log [DBG] : 14.31 repair starts
2018-02-28 08:53:00.704020 7f4dd0816700  0 log_channel(cluster) log [DBG] : 14.31 repair ok, 0 fixed


                  I made a list of when I got
                    OSD_SCRUB_ERRORS, what PG and what OSD had to repair
                    PG. Date is dd/mm/yyyy

                  
                  21/12/2017   --  pg 14.29 is active+clean+inconsistent, acting [6,2,4]

18/01/2018   --  pg 14.5a is active+clean+inconsistent, acting [6,4,1]

22/01/2018   --  pg 9.3a is active+clean+inconsistent, acting [2,7]

29/01/2018   --  pg 13.3e is active+clean+inconsistent, acting [4,6,1]
                 instructing pg 13.3e on osd.4 to repair

07/02/2018   --  pg 13.7e is active+clean+inconsistent, acting [8,2,5]
                 instructing pg 13.7e on osd.8 to repair

09/02/2018   --  pg 13.30 is active+clean+inconsistent, acting [7,3,2]
                 instructing pg 13.30 on osd.7 to repair

15/02/2018   --  pg 9.35 is active+clean+inconsistent, acting [1,8]
                 instructing pg 9.35 on osd.1 to repair

                 pg 13.3e is active+clean+inconsistent, acting [4,6,1]
                 instructing pg 13.3e on osd.4 to repair

17/02/2018   --  pg 9.2d is active+clean+inconsistent, acting [7,5]
                 instructing pg 9.2d on osd.7 to repair                 

22/02/2018   --  pg 9.24 is active+clean+inconsistent, acting [5,8]
                 instructing pg 9.24 on osd.5 to repair

28/02/2018   --  pg 13.65 is active+clean+inconsistent, acting [4,2,6]
                 instructing pg 13.65 on osd.4 to repair

                 pg 14.31 is active+clean+inconsistent, acting [8,3,1]
                 instructing pg 14.31 on osd.8 to repair


                  If can be useful, my ceph.conf is here:
                  [global]
auth client required = none
auth cluster required = none
auth service required = none
fsid = 24d5d6bc-0943-4345-b44e-46c19099004b
cluster network = 10.10.10.0/24
public network = 10.10.10.0/24
keyring = /etc/pve/priv/$cluster.$name.keyring
mon allow pool delete = true
osd journal size = 5120
osd pool default min size = 2
osd pool default size = 3
bluestore_block_db_size = 64424509440

debug asok = 0/0
debug auth = 0/0
debug buffer = 0/0
debug client = 0/0
debug context = 0/0
debug crush = 0/0
debug filer = 0/0
debug filestore = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug journal = 0/0
debug journaler = 0/0
debug lockdep = 0/0
debug mds = 0/0
debug mds balancer = 0/0
debug mds locker = 0/0
debug mds log = 0/0
debug mds log expire = 0/0
debug mds migrator = 0/0
debug mon = 0/0
debug monc = 0/0
debug ms = 0/0
debug objclass = 0/0
debug objectcacher = 0/0
debug objecter = 0/0
debug optracker = 0/0
debug osd = 0/0
debug paxos = 0/0
debug perfcounter = 0/0
debug rados = 0/0
debug rbd = 0/0
debug rgw = 0/0
debug throttle = 0/0
debug timer = 0/0
debug tp = 0/0


[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring
osd max backfills = 1
osd recovery max active = 1

osd scrub begin hour = 20
osd scrub end hour = 7
osd scrub during recovery = false
osd scrub load threshold = 0.3

[client]
rbd cache = true
rbd cache size = 268435456      # 256MB
rbd cache max dirty = 201326592    # 192MB
rbd cache max dirty age = 2
rbd cache target dirty = 33554432    # 32MB
rbd cache writethrough until flush = true


#[mgr]
#debug_mgr = 20


[mon.pve-hs-main]
host = pve-hs-main
mon addr = 10.10.10.251:6789

[mon.pve-hs-2]
host = pve-hs-2
mon addr = 10.10.10.252:6789

[mon.pve-hs-3]
host = pve-hs-3
mon addr = 10.10.10.253:6789


                  My ceph versions:
                  {
    "mon": {
        "ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)": 3
    },
    "mgr": {
        "ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)": 3
    },
    "osd": {
        "ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)": 12
    },
    "mds": {},
    "overall": {
        "ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)": 18
    }
}


                  My ceph osd tree:
                  ID CLASS WEIGHT  TYPE NAME            STATUS REWEIGHT PRI-AFF
-1       8.93686 root default
-6       2.94696     host pve-hs-2
 3   hdd 0.90959         osd.3            up  1.00000 1.00000
 4   hdd 0.90959         osd.4            up  1.00000 1.00000
 5   hdd 0.90959         osd.5            up  1.00000 1.00000
10   ssd 0.21819         osd.10           up  1.00000 1.00000
-3       2.86716     host pve-hs-3
 6   hdd 0.85599         osd.6            up  1.00000 1.00000
 7   hdd 0.85599         osd.7            up  1.00000 1.00000
 8   hdd 0.93700         osd.8            up  1.00000 1.00000
11   ssd 0.21819         osd.11           up  1.00000 1.00000
-7       3.12274     host pve-hs-main
 0   hdd 0.96819         osd.0            up  1.00000 1.00000
 1   hdd 0.96819         osd.1            up  1.00000 1.00000
 2   hdd 0.96819         osd.2            up  1.00000 1.00000
 9   ssd 0.21819         osd.9            up  1.00000 1.00000


                  My pools:
                  pool 9 'cephbackup' replicated size 2 min_size 1 crush_rule 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 5665 flags hashpspool stripe_width 0 application rbd
        removed_snaps [1~3]
pool 13 'cephwin' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 16454 flags hashpspool stripe_width 0 application rbd
        removed_snaps [1~5]
pool 14 'cephnix' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 16482 flags hashpspool stripe_width 0 application rbd
        removed_snaps [1~227]
pool 17 'cephssd' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 64 pgp_num 64 last_change 8601 flags hashpspool stripe_width 0 application rbd
        removed_snaps [1~3]


                  I can't understand where the problem comes
                    from, I don't think it's hardware, if I had a failed
                    disk, then I should have problems always on the same
                    OSD. Any ideas
                  Thanks

                  
                  -- 

                    
                          Marco
                              Baldini
                        
                        
                          H.S.
                              Amiata Srl
                        
                        
                          Ufficio:  
                          0577-779396
                        
                        
                          Cellulare:  
                          335-8765169
                        
                        
                          WEB:  
                          www.hsamiata.it
                        
                        
                          EMAIL:  
                          mbaldini@xxxxxxxxxxx
                        
                      
                _______________________________________________

                ceph-users mailing list

                ceph-users@xxxxxxxxxxxxxx

                http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

              
            -- 
            
              Mit freundlichen Grüßen / Best Regards
              Paul Emmerich
              

              croit GmbH
              Freseniusstr. 31h
              81247 München
              www.croit.io
              Tel: +49 89 1896585 90
              

              Geschäftsführer: Martin Verges
              Handelsregister: Amtsgericht München
              USt-IdNr: DE310638492
            
          
      -- 

        
              Marco Baldini
            
            
              H.S. Amiata Srl
            
            
              Ufficio:  
              0577-779396
            
            
              Cellulare:  
              335-8765169
            
            
              WEB:  
              www.hsamiata.it
            
            
              EMAIL:  
              mbaldini@xxxxxxxxxxx
            
          
      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    
    -- 

      
            Marco Baldini
          
          
            H.S. Amiata Srl
          
          
            Ufficio:  
            0577-779396
          
          
            Cellulare:  
            335-8765169
          
          
            WEB:  
            www.hsamiata.it
          
          
            EMAIL:  
            mbaldini@xxxxxxxxxxx
          
        
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com