Re: HEALTH_ERR vs HEALTH_WARN

John Spray <jspray@xxxxxxxxxx> · Wed, 22 Aug 2018 10:45:42 +0100

On Wed, Aug 22, 2018 at 7:57 AM mj <lists@xxxxxxxxxxxxx> wrote:
>
> Hi,
>
> This morning I woke up, seeing my ceph jewel 10.2.10 cluster in
> HEALTH_ERR state. That helps you getting out of bed. :-)
>
> Anyway, much to my surprise, all VMs  running on the cluster were still
> working like nothing was going on. :-)
>
> Checking a bit more reveiled:
>
> > root@pm1:~# ceph -s
> >     cluster 1397f1dc-7d94-43ea-ab12-8f8792eee9c1
> >      health HEALTH_ERR
> >             1 pgs inconsistent
> >             1 scrub errors
> >      monmap e3: 3 mons at {0=10.10.89.1:6789/0,1=10.10.89.2:6789/0,2=10.10.89.3:6789/0}
> >             election epoch 296, quorum 0,1,2 0,1,2
> >      osdmap e12662: 24 osds: 24 up, 24 in
> >             flags sortbitwise,require_jewel_osds
> >       pgmap v64045618: 1088 pgs, 2 pools, 14023 GB data, 3680 kobjects
> >             44027 GB used, 45353 GB / 89380 GB avail
> >                 1087 active+clean
> >                    1 active+clean+inconsistent
> >   client io 26462 kB/s rd, 14048 kB/s wr, 6 op/s rd, 383 op/s wr
> > root@pm1:~# ceph health detail
> > HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
> > pg 2.1a9 is active+clean+inconsistent, acting [15,23,6]
> > 1 scrub errors
> > root@pm1:~# zgrep 2.1a9 /var/log/ceph/ceph.log*
> > /var/log/ceph/ceph.log.14.gz:2017-09-11 21:02:24.755778 osd.15 10.10.89.1:6812/3810 2122 : cluster [INF] 2.1a9 deep-scrub starts
> > /var/log/ceph/ceph.log.14.gz:2017-09-11 21:08:10.537249 osd.15 10.10.89.1:6812/3810 2123 : cluster [INF] 2.1a9 deep-scrub ok
> > /var/log/ceph/ceph.log.1.gz:2018-08-22 04:33:21.156004 osd.15 10.10.89.1:6800/3352 18074 : cluster [INF] 2.1a9 deep-scrub starts
> > /var/log/ceph/ceph.log.1.gz:2018-08-22 04:40:02.579204 osd.15 10.10.89.1:6800/3352 18075 : cluster [ERR] 2.1a9 shard 23: soid 2:95b8d975:::rbd_data.2c191e238e1f29.00000000000c7c9d:head candidate had a read error
> > /var/log/ceph/ceph.log.1.gz:2018-08-22 04:41:02.720716 osd.15 10.10.89.1:6800/3352 18076 : cluster [ERR] 2.1a9 deep-scrub 0 missing, 1 inconsistent objects
>
> ok, according to the docs I should do "ceph pg repair 2.1a9". Did that,
> and some minutes later the cluster came back to "HEALTH_OK"
>
> Checking the logs:
> > /var/log/ceph/ceph.log:2018-08-22 08:23:09.682792 osd.15 10.10.89.1:6800/3352 18088 : cluster [INF] 2.1a9 repair starts
> > /var/log/ceph/ceph.log:2018-08-22 08:29:28.440526 osd.15 10.10.89.1:6800/3352 18089 : cluster [ERR] 2.1a9 shard 23: soid 2:95b8d975:::rbd_data.2c191e238e1f29.00000000000c7c9d:head candidate had a read error
> > /var/log/ceph/ceph.log:2018-08-22 08:30:18.790176 osd.15 10.10.89.1:6800/3352 18090 : cluster [ERR] 2.1a9 repair 0 missing, 1 inconsistent objects
> > /var/log/ceph/ceph.log:2018-08-22 08:30:18.791718 osd.15 10.10.89.1:6800/3352 18091 : cluster [ERR] 2.1a9 repair 1 errors, 1 fixed
>
> So, we are fine again, it seems.
>
> But now my question: can anyone what happened? Is one of my disks dying?
> In the proxmox gui, all osd disks are SMART status "OK".
>
> Besides that, as the cluster was still running and the fix was
> relatively simple, would a HEALTH_WARN not have been more appropriate?

An inconsistent PG generally implies data corruption, which is usually
pretty scary.  Your cluster may have been running okay for the moment,
but things might not be so good if your workload happens to touch that
one inconsistent object.

This is a subjective thing, and sometimes users aren't so worried
about inconsistency:
 - known-unreliable hardware, and are expecting to encounter periodic
corruptions.
 - pools that are just for dev/test, where corruption is not an urgent issue

In those cases, they might need to do some external filtering of
health checks, possibly down-grading the PG_DAMAGED check.

> And, since this is a size 3, min 2 pool... shouldn't this have been
> taken care of automatically..? ('self-healing' and all that..?)

The good news is that there's an osd_scrub_auto_repair option (default
is false).

I imagine there was probably some historical debate about whether that
should be on by default, core RADOS folks probably know more.

John

> So, I'm having my morning coffee finally, wondering what happened... :-)
>
> Best regards to all, have a nice day!
>
> MJ
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com