Re: HEALTH_ERR vs HEALTH_WARN

mj <lists@xxxxxxxxxxxxx> · Thu, 23 Aug 2018 09:26:04 +0200

Hi,

Thanks John and Gregory for your answers.

Gregory's answer worries us. We thought that with a 3/2 pool, and one PG 
corrupted, the assumption would be: the two similar ones are correct, 
and the third one needs to be adjusted.

Can we determine from this output, if I created corruption in our cluster..?

root@pm1:~# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
pg 2.1a9 is active+clean+inconsistent, acting [15,23,6]
1 scrub errors
root@pm1:~# zgrep 2.1a9 /var/log/ceph/ceph.log*
/var/log/ceph/ceph.log.14.gz:2017-09-11 21:02:24.755778 osd.15 10.10.89.1:6812/3810 2122 : cluster [INF] 2.1a9 deep-scrub starts
/var/log/ceph/ceph.log.14.gz:2017-09-11 21:08:10.537249 osd.15 10.10.89.1:6812/3810 2123 : cluster [INF] 2.1a9 deep-scrub ok
/var/log/ceph/ceph.log.1.gz:2018-08-22 04:33:21.156004 osd.15 10.10.89.1:6800/3352 18074 : cluster [INF] 2.1a9 deep-scrub starts
/var/log/ceph/ceph.log.1.gz:2018-08-22 04:40:02.579204 osd.15 10.10.89.1:6800/3352 18075 : cluster [ERR] 2.1a9 shard 23: soid 2:95b8d975:::rbd_data.2c191e238e1f29.00000000000c7c9d:head candidate had a read error
/var/log/ceph/ceph.log.1.gz:2018-08-22 04:41:02.720716 osd.15 10.10.89.1:6800/3352 18076 : cluster [ERR] 2.1a9 deep-scrub 0 missing, 1 inconsistent objects

/var/log/ceph/ceph.log:2018-08-22 08:23:09.682792 osd.15 10.10.89.1:6800/3352 18088 : cluster [INF] 2.1a9 repair starts
/var/log/ceph/ceph.log:2018-08-22 08:29:28.440526 osd.15 10.10.89.1:6800/3352 18089 : cluster [ERR] 2.1a9 shard 23: soid 2:95b8d975:::rbd_data.2c191e238e1f29.00000000000c7c9d:head candidate had a read error
/var/log/ceph/ceph.log:2018-08-22 08:30:18.790176 osd.15 10.10.89.1:6800/3352 18090 : cluster [ERR] 2.1a9 repair 0 missing, 1 inconsistent objects
/var/log/ceph/ceph.log:2018-08-22 08:30:18.791718 osd.15 10.10.89.1:6800/3352 18091 : cluster [ERR] 2.1a9 repair 1 errors, 1 fixed

And also: jewel (which we're running) is considered "the old past" with 
the old non-checksum behaviour?

In case this occurs again... what would be the steps to determine WHICH 
pg is the corrupt one, and how to proceed it it happens to be the 
primary pg for an object?

Upgrading to luminous would prevent this from happening again i guess. 
We're a bit scared to upgrade, because there seem to be so many issues 
with luminous and upgrading to it.

Having said all this: we are surprised to see this is on our cluster, as 
it should be and has been running stable and reliably for over two 
years. Perhaps just a one-time glitch.

Thanks for your replies!

MJ

On 08/23/2018 01:06 AM, Gregory Farnum wrote:
On Wed, Aug 22, 2018 at 2:46 AM John Spray <jspray@xxxxxxxxxx 
<mailto:jspray@xxxxxxxxxx>> wrote:

    On Wed, Aug 22, 2018 at 7:57 AM mj <lists@xxxxxxxxxxxxx
    <mailto:lists@xxxxxxxxxxxxx>> wrote:
     >
     > Hi,
     >
     > This morning I woke up, seeing my ceph jewel 10.2.10 cluster in
     > HEALTH_ERR state. That helps you getting out of bed. :-)
     >
     > Anyway, much to my surprise, all VMs  running on the cluster were
    still
     > working like nothing was going on. :-)
     >
     > Checking a bit more reveiled:
     >
     > > root@pm1:~# ceph -s
     > >     cluster 1397f1dc-7d94-43ea-ab12-8f8792eee9c1
     > >      health HEALTH_ERR
     > >             1 pgs inconsistent
     > >             1 scrub errors
     > >      monmap e3: 3 mons at
    {0=10.10.89.1:6789/0,1=10.10.89.2:6789/0,2=10.10.89.3:6789/0
    <http://10.10.89.1:6789/0,1=10.10.89.2:6789/0,2=10.10.89.3:6789/0>}
     > >             election epoch 296, quorum 0,1,2 0,1,2
     > >      osdmap e12662: 24 osds: 24 up, 24 in
     > >             flags sortbitwise,require_jewel_osds
     > >       pgmap v64045618: 1088 pgs, 2 pools, 14023 GB data, 3680
    kobjects
     > >             44027 GB used, 45353 GB / 89380 GB avail
     > >                 1087 active+clean
     > >                    1 active+clean+inconsistent
     > >   client io 26462 kB/s rd, 14048 kB/s wr, 6 op/s rd, 383 op/s wr
     > > root@pm1:~# ceph health detail
     > > HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
     > > pg 2.1a9 is active+clean+inconsistent, acting [15,23,6]
     > > 1 scrub errors
     > > root@pm1:~# zgrep 2.1a9 /var/log/ceph/ceph.log*
     > > /var/log/ceph/ceph.log.14.gz:2017-09-11 21:02:24.755778 osd.15
    10.10.89.1:6812/3810 <http://10.10.89.1:6812/3810> 2122 : cluster
    [INF] 2.1a9 deep-scrub starts
     > > /var/log/ceph/ceph.log.14.gz:2017-09-11 21:08:10.537249 osd.15
    10.10.89.1:6812/3810 <http://10.10.89.1:6812/3810> 2123 : cluster
    [INF] 2.1a9 deep-scrub ok
     > > /var/log/ceph/ceph.log.1.gz:2018-08-22 04:33:21.156004 osd.15
    10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18074 : cluster
    [INF] 2.1a9 deep-scrub starts
     > > /var/log/ceph/ceph.log.1.gz:2018-08-22 04:40:02.579204 osd.15
    10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18075 : cluster
    [ERR] 2.1a9 shard 23: soid
    2:95b8d975:::rbd_data.2c191e238e1f29.00000000000c7c9d:head candidate
    had a read error
     > > /var/log/ceph/ceph.log.1.gz:2018-08-22 04:41:02.720716 osd.15
    10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18076 : cluster
    [ERR] 2.1a9 deep-scrub 0 missing, 1 inconsistent objects
     >
     > ok, according to the docs I should do "ceph pg repair 2.1a9". Did
    that,
     > and some minutes later the cluster came back to "HEALTH_OK"
     >
     > Checking the logs:
     > > /var/log/ceph/ceph.log:2018-08-22 08:23:09.682792 osd.15
    10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18088 : cluster
    [INF] 2.1a9 repair starts
     > > /var/log/ceph/ceph.log:2018-08-22 08:29:28.440526 osd.15
    10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18089 : cluster
    [ERR] 2.1a9 shard 23: soid
    2:95b8d975:::rbd_data.2c191e238e1f29.00000000000c7c9d:head candidate
    had a read error
     > > /var/log/ceph/ceph.log:2018-08-22 08:30:18.790176 osd.15
    10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18090 : cluster
    [ERR] 2.1a9 repair 0 missing, 1 inconsistent objects
     > > /var/log/ceph/ceph.log:2018-08-22 08:30:18.791718 osd.15
    10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18091 : cluster
    [ERR] 2.1a9 repair 1 errors, 1 fixed
     >
     > So, we are fine again, it seems.
     >
     > But now my question: can anyone what happened? Is one of my disks
    dying?
     > In the proxmox gui, all osd disks are SMART status "OK".
     >
     > Besides that, as the cluster was still running and the fix was
     > relatively simple, would a HEALTH_WARN not have been more
    appropriate?

    An inconsistent PG generally implies data corruption, which is usually
    pretty scary.  Your cluster may have been running okay for the moment,
    but things might not be so good if your workload happens to touch that
    one inconsistent object.

    This is a subjective thing, and sometimes users aren't so worried
    about inconsistency:
      - known-unreliable hardware, and are expecting to encounter periodic
    corruptions.
      - pools that are just for dev/test, where corruption is not an
    urgent issue

    In those cases, they might need to do some external filtering of
    health checks, possibly down-grading the PG_DAMAGED check.

     > And, since this is a size 3, min 2 pool... shouldn't this have been
     > taken care of automatically..? ('self-healing' and all that..?)

    The good news is that there's an osd_scrub_auto_repair option (default
    is false).

    I imagine there was probably some historical debate about whether that
    should be on by default, core RADOS folks probably know more.

In the past, "recovery" merely forced all the replicas into alignment 
with the primary. If the primary was the bad copy...well, too bad!

Things are much better now that we have checksums in various places and 
take more care about it. But it's still possible to configure and use 
Ceph so that we don't know what the right answer is, and these kinds of 
issues really aren't supposed to turn up, so we don't yet feel 
comfortable auto-repairing.
-Greg

    John

     > So, I'm having my morning coffee finally, wondering what
    happened... :-)
     >
     > Best regards to all, have a nice day!
     >
     > MJ
     > _______________________________________________
     > ceph-users mailing list
     > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com