Re: ceph health JSON format has changed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Wed, Jan 2, 2019 at 5:12 AM Jan Kasprzak <kas@xxxxxxxxxx> wrote:
Thomas Byrne - UKRI STFC wrote:
: I recently spent some time looking at this, I believe the 'summary' and
: 'overall_status' sections are now deprecated. The 'status' and 'checks'
: fields are the ones to use now.

        OK, thanks.

: The 'status' field gives you the OK/WARN/ERR, but returning the most
: severe error condition from the 'checks' section is less trivial. AFAIK
: all health_warn states are treated as equally severe, and same for
: health_err. We ended up formatting our single line human readable output
: as something like:
:
: "HEALTH_ERR: 1 inconsistent pg, HEALTH_ERR: 1 scrub error, HEALTH_WARN: 20 large omap objects"

        Speaking of scrub errors:

        In previous versions of Ceph, I was able to determine which PGs had
scrub errors, and then a cron.hourly script ran "ceph pg repair" for them,
provided that they were not already being scrubbed. In Luminous, the bad PG
is not visible in "ceph --status" anywhere. Should I use something like
"ceph health detail -f json-pretty" instead?

        Also, is it possible to configure Ceph to attempt repairing
the bad PGs itself, as soon as the scrub fails? I run most of my OSDs on top
of a bunch of old spinning disks, and a scrub error almost always means
that there is a bad sector somewhere, which can easily be fixed by
rewriting the lost data using "ceph pg repair".

It is possible. It's a lot safer than it used to be, but is still NOT RECOMMENDED for replicated pools.

But if you are very sure, you can use the options osd_scrub_auto_repair (default: false) and osd_scrub_auto_repair_num_errors (default:5, which will not auto-repair if scrub detects more errors than that value) to configure it.
-Greg
 

        Thanks,

-Yenya

--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| http://www.fi.muni.cz/~kas/                         GPG: 4096R/A45477D5 |
 This is the world we live in: the way to deal with computers is to google
 the symptoms, and hope that you don't have to watch a video. --P. Zaitcev
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux