Re: ceph health JSON format has changed

Jan Kasprzak <kas@xxxxxxxxxx> · Wed, 2 Jan 2019 14:12:35 +0100

Thomas Byrne - UKRI STFC wrote:
: I recently spent some time looking at this, I believe the 'summary' and
: 'overall_status' sections are now deprecated. The 'status' and 'checks'
: fields are the ones to use now.

	OK, thanks.

: The 'status' field gives you the OK/WARN/ERR, but returning the most
: severe error condition from the 'checks' section is less trivial. AFAIK
: all health_warn states are treated as equally severe, and same for
: health_err. We ended up formatting our single line human readable output
: as something like:
: 
: "HEALTH_ERR: 1 inconsistent pg, HEALTH_ERR: 1 scrub error, HEALTH_WARN: 20 large omap objects"

	Speaking of scrub errors:

	In previous versions of Ceph, I was able to determine which PGs had
scrub errors, and then a cron.hourly script ran "ceph pg repair" for them,
provided that they were not already being scrubbed. In Luminous, the bad PG
is not visible in "ceph --status" anywhere. Should I use something like
"ceph health detail -f json-pretty" instead?

	Also, is it possible to configure Ceph to attempt repairing
the bad PGs itself, as soon as the scrub fails? I run most of my OSDs on top
of a bunch of old spinning disks, and a scrub error almost always means
that there is a bad sector somewhere, which can easily be fixed by
rewriting the lost data using "ceph pg repair".

	Thanks,

-Yenya

-- 
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| http://www.fi.muni.cz/~kas/                         GPG: 4096R/A45477D5 |
 This is the world we live in: the way to deal with computers is to google
 the symptoms, and hope that you don't have to watch a video. --P. Zaitcev
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com