Re: Random health OSD_SCRUB_ERRORS on various OSDs, after pg repair back to HEALTH_OK

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

I just posted in the ceph tracker with my logs and my issue

Let's hope this will be fixed

Thanks


Il 05/03/2018 13:36, Paul Emmerich ha scritto:
Hi,

yeah, the cluster that I'm seeing this on also has only one host that reports that specific checksum. Two other hosts only report the same error that you are seeing.

Could you post to the tracker issue that you are also seeing this?

Paul

2018-03-05 12:21 GMT+01:00 Marco Baldini - H.S. Amiata <mbaldini@xxxxxxxxxxx>:

Hi

After some days with debug_osd 5/5 I found [ERR] in different days, different PGs, different OSDs, different hosts. This is what I get in the OSD logs:

OSD.5 (host 3)
2018-03-01 20:30:02.702269 7fdf4d515700  2 osd.5 pg_epoch: 16486 pg[9.1c( v 16486'51798 (16431'50251,16486'51798] local-lis/les=16474/16475 n=3629 ec=1477/1477 lis/c 16474/16474 les/c/f 16475/16477/0 16474/16474/16474) [5,6] r=0 lpr=16474 crt=16486'51798 lcod 16486'51797 mlcod 16486'51797 active+clean+scrubbing+deep] 9.1c shard 6: soid 9:3b157c56:::rbd_data.1526386b8b4567.0000000000001761:head candidate had a read error
2018-03-01 20:30:02.702278 7fdf4d515700 -1 log_channel(cluster) log [ERR] : 9.1c shard 6: soid 9:3b157c56:::rbd_data.1526386b8b4567.0000000000001761:head candidate had a read error

OSD.4 (host 3)
2018-02-28 00:03:33.458558 7f112cf76700 -1 log_channel(cluster) log [ERR] : 13.65 shard 2: soid 13:a719ecdf:::rbd_data.5f65056b8b4567.000000000000f8eb:head candidate had a read error
OSD.8 (host 2)
2018-02-27 23:55:15.100084 7f4dd0816700 -1 log_channel(cluster) log [ERR] : 14.31 shard 1: soid 14:8cc6cd37:::rbd_data.30b15b6b8b4567.00000000000081a1:head candidate had a read error

I don't know what this error is meaning, and as always a ceph pg repair fixes it. I don't think this is normal.

Ideas?

Thanks


Il 28/02/2018 14:48, Marco Baldini - H.S. Amiata ha scritto:

Hi

I read the bugtracker issue and it seems a lot like my problem, even if I can't check the reported checksum because I don't have it in my logs, perhaps it's because of debug osd = 0/0 in ceph.conf

I just raised the OSD log level

ceph tell osd.* injectargs --debug-osd 5/5

I'll check OSD logs in the next days...

Thanks



Il 28/02/2018 11:59, Paul Emmerich ha scritto:
Hi,


Can you check the OSD log file to see if the reported checksum is 0x6706be76?


Paul

Am 28.02.2018 um 11:43 schrieb Marco Baldini - H.S. Amiata <mbaldini@xxxxxxxxxxx>:

Hello

I have a little ceph cluster with 3 nodes, each with 3x1TB HDD and 1x240GB SSD. I created this cluster after Luminous release, so all OSDs are Bluestore. In my crush map I have two rules, one targeting the SSDs and one targeting the HDDs. I have 4 pools, one using the SSD rule and the others using the HDD rule, three pools are size=3 min_size=2, one is size=2 min_size=1 (this one have content that it's ok to lose)

In the last 3 month I'm having a strange random problem. I planned my osd scrubs during the night (osd scrub begin hour = 20, osd scrub end hour = 7) when office is closed so there is low impact on the users. Some mornings, when I ceph the cluster health, I find:

HEALTH_ERR X scrub errors; Possible data damage: Y pgs inconsistent
OSD_SCRUB_ERRORS X scrub errors
PG_DAMAGED Possible data damage: Y pg inconsistent

X and Y sometimes are 1, sometimes 2.

I issue a ceph health detail, check the damaged PGs, and run a ceph pg repair for the damaged PGs, I get

instructing pg PG on osd.N to repair

PG are different, OSD that have to repair PG is different, even the node hosting the OSD is different, I made a list of all PGs and OSDs. This morning is the most recent case:

> ceph health detail
HEALTH_ERR 2 scrub errors; Possible data damage: 2 pgs inconsistent
OSD_SCRUB_ERRORS 2 scrub errors
PG_DAMAGED Possible data damage: 2 pgs inconsistent
pg 13.65 is active+clean+inconsistent, acting [4,2,6]
pg 14.31 is active+clean+inconsistent, acting [8,3,1]
> ceph pg repair 13.65
instructing pg 13.65 on osd.4 to repair

(node-2)> tail /var/log/ceph/ceph-osd.4.log
2018-02-28 08:38:47.593447 7f112cf76700  0 log_channel(cluster) log [DBG] : 13.65 repair starts
2018-02-28 08:39:37.573342 7f112cf76700  0 log_channel(cluster) log [DBG] : 13.65 repair ok, 0 fixed
> ceph pg repair 14.31
instructing pg 14.31 on osd.8 to repair

(node-3)> tail /var/log/ceph/ceph-osd.8.log
2018-02-28 08:52:37.297490 7f4dd0816700  0 log_channel(cluster) log [DBG] : 14.31 repair starts
2018-02-28 08:53:00.704020 7f4dd0816700  0 log_channel(cluster) log [DBG] : 14.31 repair ok, 0 fixed


I made a list of when I got OSD_SCRUB_ERRORS, what PG and what OSD had to repair PG. Date is dd/mm/yyyy

21/12/2017   --  pg 14.29 is active+clean+inconsistent, acting [6,2,4]

18/01/2018   --  pg 14.5a is active+clean+inconsistent, acting [6,4,1]

22/01/2018   --  pg 9.3a is active+clean+inconsistent, acting [2,7]

29/01/2018   --  pg 13.3e is active+clean+inconsistent, acting [4,6,1]
                 instructing pg 13.3e on osd.4 to repair

07/02/2018   --  pg 13.7e is active+clean+inconsistent, acting [8,2,5]
                 instructing pg 13.7e on osd.8 to repair

09/02/2018   --  pg 13.30 is active+clean+inconsistent, acting [7,3,2]
                 instructing pg 13.30 on osd.7 to repair

15/02/2018   --  pg 9.35 is active+clean+inconsistent, acting [1,8]
                 instructing pg 9.35 on osd.1 to repair

                 pg 13.3e is active+clean+inconsistent, acting [4,6,1]
                 instructing pg 13.3e on osd.4 to repair

17/02/2018   --  pg 9.2d is active+clean+inconsistent, acting [7,5]
                 instructing pg 9.2d on osd.7 to repair                 

22/02/2018   --  pg 9.24 is active+clean+inconsistent, acting [5,8]
                 instructing pg 9.24 on osd.5 to repair

28/02/2018   --  pg 13.65 is active+clean+inconsistent, acting [4,2,6]
                 instructing pg 13.65 on osd.4 to repair

                 pg 14.31 is active+clean+inconsistent, acting [8,3,1]
                 instructing pg 14.31 on osd.8 to repair



If can be useful, my ceph.conf is here:

[global]
auth client required = none
auth cluster required = none
auth service required = none
fsid = 24d5d6bc-0943-4345-b44e-46c19099004b
cluster network = 10.10.10.0/24
public network = 10.10.10.0/24
keyring = /etc/pve/priv/$cluster.$name.keyring
mon allow pool delete = true
osd journal size = 5120
osd pool default min size = 2
osd pool default size = 3
bluestore_block_db_size = 64424509440

debug asok = 0/0
debug auth = 0/0
debug buffer = 0/0
debug client = 0/0
debug context = 0/0
debug crush = 0/0
debug filer = 0/0
debug filestore = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug journal = 0/0
debug journaler = 0/0
debug lockdep = 0/0
debug mds = 0/0
debug mds balancer = 0/0
debug mds locker = 0/0
debug mds log = 0/0
debug mds log expire = 0/0
debug mds migrator = 0/0
debug mon = 0/0
debug monc = 0/0
debug ms = 0/0
debug objclass = 0/0
debug objectcacher = 0/0
debug objecter = 0/0
debug optracker = 0/0
debug osd = 0/0
debug paxos = 0/0
debug perfcounter = 0/0
debug rados = 0/0
debug rbd = 0/0
debug rgw = 0/0
debug throttle = 0/0
debug timer = 0/0
debug tp = 0/0


[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring
osd max backfills = 1
osd recovery max active = 1

osd scrub begin hour = 20
osd scrub end hour = 7
osd scrub during recovery = false
osd scrub load threshold = 0.3

[client]
rbd cache = true
rbd cache size = 268435456      # 256MB
rbd cache max dirty = 201326592    # 192MB
rbd cache max dirty age = 2
rbd cache target dirty = 33554432    # 32MB
rbd cache writethrough until flush = true


#[mgr]
#debug_mgr = 20


[mon.pve-hs-main]
host = pve-hs-main
mon addr = 10.10.10.251:6789

[mon.pve-hs-2]
host = pve-hs-2
mon addr = 10.10.10.252:6789

[mon.pve-hs-3]
host = pve-hs-3
mon addr = 10.10.10.253:6789


My ceph versions:

{
    "mon": {
        "ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)": 3
    },
    "mgr": {
        "ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)": 3
    },
    "osd": {
        "ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)": 12
    },
    "mds": {},
    "overall": {
        "ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)": 18
    }
}



My ceph osd tree:

ID CLASS WEIGHT  TYPE NAME            STATUS REWEIGHT PRI-AFF
-1       8.93686 root default
-6       2.94696     host pve-hs-2
 3   hdd 0.90959         osd.3            up  1.00000 1.00000
 4   hdd 0.90959         osd.4            up  1.00000 1.00000
 5   hdd 0.90959         osd.5            up  1.00000 1.00000
10   ssd 0.21819         osd.10           up  1.00000 1.00000
-3       2.86716     host pve-hs-3
 6   hdd 0.85599         osd.6            up  1.00000 1.00000
 7   hdd 0.85599         osd.7            up  1.00000 1.00000
 8   hdd 0.93700         osd.8            up  1.00000 1.00000
11   ssd 0.21819         osd.11           up  1.00000 1.00000
-7       3.12274     host pve-hs-main
 0   hdd 0.96819         osd.0            up  1.00000 1.00000
 1   hdd 0.96819         osd.1            up  1.00000 1.00000
 2   hdd 0.96819         osd.2            up  1.00000 1.00000
 9   ssd 0.21819         osd.9            up  1.00000 1.00000

My pools:

pool 9 'cephbackup' replicated size 2 min_size 1 crush_rule 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 5665 flags hashpspool stripe_width 0 application rbd
        removed_snaps [1~3]
pool 13 'cephwin' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 16454 flags hashpspool stripe_width 0 application rbd
        removed_snaps [1~5]
pool 14 'cephnix' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 16482 flags hashpspool stripe_width 0 application rbd
        removed_snaps [1~227]
pool 17 'cephssd' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 64 pgp_num 64 last_change 8601 flags hashpspool stripe_width 0 application rbd
        removed_snaps [1~3]


I can't understand where the problem comes from, I don't think it's hardware, if I had a failed disk, then I should have problems always on the same OSD. Any ideas

Thanks



--
Marco Baldini
H.S. Amiata Srl
Ufficio:   0577-779396
Cellulare:   335-8765169
WEB:   www.hsamiata.it
EMAIL:   mbaldini@xxxxxxxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Mit freundlichen Grüßen / Best Regards
Paul Emmerich

croit GmbH
Freseniusstr. 31h
81247 München

Geschäftsführer: Martin Verges
Handelsregister: Amtsgericht München
USt-IdNr: DE310638492


--
Marco Baldini
H.S. Amiata Srl
Ufficio:   0577-779396
Cellulare:   335-8765169
WEB:   www.hsamiata.it
EMAIL:   mbaldini@xxxxxxxxxxx


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Marco Baldini
H.S. Amiata Srl
Ufficio:   0577-779396
Cellulare:   335-8765169
WEB:   www.hsamiata.it
EMAIL:   mbaldini@xxxxxxxxxxx

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
--
Paul Emmerich

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

--
Marco Baldini
H.S. Amiata Srl
Ufficio:   0577-779396
Cellulare:   335-8765169
WEB:   www.hsamiata.it
EMAIL:   mbaldini@xxxxxxxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux