Hi,
Thanks John and Gregory for your answers.
Gregory's answer worries us. We thought that with a 3/2 pool, and one PG
corrupted, the assumption would be: the two similar ones are correct,
and the third one needs to be adjusted.
Can we determine from this output, if I created corruption in our cluster..?
root@pm1:~# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
pg 2.1a9 is active+clean+inconsistent, acting [15,23,6]
1 scrub errors
root@pm1:~# zgrep 2.1a9 /var/log/ceph/ceph.log*
/var/log/ceph/ceph.log.14.gz:2017-09-11 21:02:24.755778 osd.15 10.10.89.1:6812/3810 2122 : cluster [INF] 2.1a9 deep-scrub starts
/var/log/ceph/ceph.log.14.gz:2017-09-11 21:08:10.537249 osd.15 10.10.89.1:6812/3810 2123 : cluster [INF] 2.1a9 deep-scrub ok
/var/log/ceph/ceph.log.1.gz:2018-08-22 04:33:21.156004 osd.15 10.10.89.1:6800/3352 18074 : cluster [INF] 2.1a9 deep-scrub starts
/var/log/ceph/ceph.log.1.gz:2018-08-22 04:40:02.579204 osd.15 10.10.89.1:6800/3352 18075 : cluster [ERR] 2.1a9 shard 23: soid 2:95b8d975:::rbd_data.2c191e238e1f29.00000000000c7c9d:head candidate had a read error
/var/log/ceph/ceph.log.1.gz:2018-08-22 04:41:02.720716 osd.15 10.10.89.1:6800/3352 18076 : cluster [ERR] 2.1a9 deep-scrub 0 missing, 1 inconsistent objects
/var/log/ceph/ceph.log:2018-08-22 08:23:09.682792 osd.15 10.10.89.1:6800/3352 18088 : cluster [INF] 2.1a9 repair starts
/var/log/ceph/ceph.log:2018-08-22 08:29:28.440526 osd.15 10.10.89.1:6800/3352 18089 : cluster [ERR] 2.1a9 shard 23: soid 2:95b8d975:::rbd_data.2c191e238e1f29.00000000000c7c9d:head candidate had a read error
/var/log/ceph/ceph.log:2018-08-22 08:30:18.790176 osd.15 10.10.89.1:6800/3352 18090 : cluster [ERR] 2.1a9 repair 0 missing, 1 inconsistent objects
/var/log/ceph/ceph.log:2018-08-22 08:30:18.791718 osd.15 10.10.89.1:6800/3352 18091 : cluster [ERR] 2.1a9 repair 1 errors, 1 fixed
And also: jewel (which we're running) is considered "the old past" with
the old non-checksum behaviour?
In case this occurs again... what would be the steps to determine WHICH
pg is the corrupt one, and how to proceed it it happens to be the
primary pg for an object?
Upgrading to luminous would prevent this from happening again i guess.
We're a bit scared to upgrade, because there seem to be so many issues
with luminous and upgrading to it.
Having said all this: we are surprised to see this is on our cluster, as
it should be and has been running stable and reliably for over two
years. Perhaps just a one-time glitch.
Thanks for your replies!
MJ
On 08/23/2018 01:06 AM, Gregory Farnum wrote:
On Wed, Aug 22, 2018 at 2:46 AM John Spray <jspray@xxxxxxxxxx
<mailto:jspray@xxxxxxxxxx>> wrote:
On Wed, Aug 22, 2018 at 7:57 AM mj <lists@xxxxxxxxxxxxx
<mailto:lists@xxxxxxxxxxxxx>> wrote:
>
> Hi,
>
> This morning I woke up, seeing my ceph jewel 10.2.10 cluster in
> HEALTH_ERR state. That helps you getting out of bed. :-)
>
> Anyway, much to my surprise, all VMs running on the cluster were
still
> working like nothing was going on. :-)
>
> Checking a bit more reveiled:
>
> > root@pm1:~# ceph -s
> > cluster 1397f1dc-7d94-43ea-ab12-8f8792eee9c1
> > health HEALTH_ERR
> > 1 pgs inconsistent
> > 1 scrub errors
> > monmap e3: 3 mons at
{0=10.10.89.1:6789/0,1=10.10.89.2:6789/0,2=10.10.89.3:6789/0
<http://10.10.89.1:6789/0,1=10.10.89.2:6789/0,2=10.10.89.3:6789/0>}
> > election epoch 296, quorum 0,1,2 0,1,2
> > osdmap e12662: 24 osds: 24 up, 24 in
> > flags sortbitwise,require_jewel_osds
> > pgmap v64045618: 1088 pgs, 2 pools, 14023 GB data, 3680
kobjects
> > 44027 GB used, 45353 GB / 89380 GB avail
> > 1087 active+clean
> > 1 active+clean+inconsistent
> > client io 26462 kB/s rd, 14048 kB/s wr, 6 op/s rd, 383 op/s wr
> > root@pm1:~# ceph health detail
> > HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
> > pg 2.1a9 is active+clean+inconsistent, acting [15,23,6]
> > 1 scrub errors
> > root@pm1:~# zgrep 2.1a9 /var/log/ceph/ceph.log*
> > /var/log/ceph/ceph.log.14.gz:2017-09-11 21:02:24.755778 osd.15
10.10.89.1:6812/3810 <http://10.10.89.1:6812/3810> 2122 : cluster
[INF] 2.1a9 deep-scrub starts
> > /var/log/ceph/ceph.log.14.gz:2017-09-11 21:08:10.537249 osd.15
10.10.89.1:6812/3810 <http://10.10.89.1:6812/3810> 2123 : cluster
[INF] 2.1a9 deep-scrub ok
> > /var/log/ceph/ceph.log.1.gz:2018-08-22 04:33:21.156004 osd.15
10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18074 : cluster
[INF] 2.1a9 deep-scrub starts
> > /var/log/ceph/ceph.log.1.gz:2018-08-22 04:40:02.579204 osd.15
10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18075 : cluster
[ERR] 2.1a9 shard 23: soid
2:95b8d975:::rbd_data.2c191e238e1f29.00000000000c7c9d:head candidate
had a read error
> > /var/log/ceph/ceph.log.1.gz:2018-08-22 04:41:02.720716 osd.15
10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18076 : cluster
[ERR] 2.1a9 deep-scrub 0 missing, 1 inconsistent objects
>
> ok, according to the docs I should do "ceph pg repair 2.1a9". Did
that,
> and some minutes later the cluster came back to "HEALTH_OK"
>
> Checking the logs:
> > /var/log/ceph/ceph.log:2018-08-22 08:23:09.682792 osd.15
10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18088 : cluster
[INF] 2.1a9 repair starts
> > /var/log/ceph/ceph.log:2018-08-22 08:29:28.440526 osd.15
10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18089 : cluster
[ERR] 2.1a9 shard 23: soid
2:95b8d975:::rbd_data.2c191e238e1f29.00000000000c7c9d:head candidate
had a read error
> > /var/log/ceph/ceph.log:2018-08-22 08:30:18.790176 osd.15
10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18090 : cluster
[ERR] 2.1a9 repair 0 missing, 1 inconsistent objects
> > /var/log/ceph/ceph.log:2018-08-22 08:30:18.791718 osd.15
10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18091 : cluster
[ERR] 2.1a9 repair 1 errors, 1 fixed
>
> So, we are fine again, it seems.
>
> But now my question: can anyone what happened? Is one of my disks
dying?
> In the proxmox gui, all osd disks are SMART status "OK".
>
> Besides that, as the cluster was still running and the fix was
> relatively simple, would a HEALTH_WARN not have been more
appropriate?
An inconsistent PG generally implies data corruption, which is usually
pretty scary. Your cluster may have been running okay for the moment,
but things might not be so good if your workload happens to touch that
one inconsistent object.
This is a subjective thing, and sometimes users aren't so worried
about inconsistency:
- known-unreliable hardware, and are expecting to encounter periodic
corruptions.
- pools that are just for dev/test, where corruption is not an
urgent issue
In those cases, they might need to do some external filtering of
health checks, possibly down-grading the PG_DAMAGED check.
> And, since this is a size 3, min 2 pool... shouldn't this have been
> taken care of automatically..? ('self-healing' and all that..?)
The good news is that there's an osd_scrub_auto_repair option (default
is false).
I imagine there was probably some historical debate about whether that
should be on by default, core RADOS folks probably know more.
In the past, "recovery" merely forced all the replicas into alignment
with the primary. If the primary was the bad copy...well, too bad!
Things are much better now that we have checksums in various places and
take more care about it. But it's still possible to configure and use
Ceph so that we don't know what the right answer is, and these kinds of
issues really aren't supposed to turn up, so we don't yet feel
comfortable auto-repairing.
-Greg
John
> So, I'm having my morning coffee finally, wondering what
happened... :-)
>
> Best regards to all, have a nice day!
>
> MJ
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com