> -----Original Message----- > From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel- > owner@xxxxxxxxxxxxxxx] On Behalf Of Mark Nelson > Sent: Wednesday, December 02, 2015 11:04 AM > To: Gregory Farnum; Vimal > Cc: ceph-devel > Subject: Re: Suggestions on tracker 13578 > > > On 12/02/2015 12:23 PM, Gregory Farnum wrote: > > On Tue, Dec 1, 2015 at 5:23 AM, Vimal <vikumar@xxxxxxxxxx> wrote: > >> Hello, > >> > >> This mail is to discuss the feature request at > >> http://tracker.ceph.com/issues/13578. > >> > >> If done, such a tool should help point out several mis-configurations > >> that may cause problems in a cluster later. > >> > >> Some of the suggestions are: > >> > >> a) A check to understand if the MONs and OSD nodes are on the same > machines. > >> > >> b) If /var is a separate partition or not, to prevent the root > >> filesystem from being filled up. > >> > >> c) If monitors are deployed in different failure domains or not. > >> > >> d) If the OSDs are deployed in different failure domains. > >> > >> e) If a journal disk is used for more than six OSDs. Right now, the > >> documentation suggests upto 6 OSD journals to exist on a single > >> journal disk. > >> > >> f) Failure domains depending on the power source. > >> > >> There can be several more checks, and it can be a useful tool to test > >> the problems an existing cluster or a new installation. > >> > >> But I'd like to know how the engineering community sees this, if its > >> seems to be worth pursuing, and what suggestions do you have for > >> improving/adding to this. > > > > This is a user experience and support tool; I don't think the > > engineering community can really judge its value. ;) > > > > So sure, sounds good to me. It'll need to get into the hands of users > > before we find out if it's a good plan or not. I was at the SDI Summit > > yesterday and was hearing about how some of our choices (like > > HEALTH_WARN on pg counts) are *really* scary for users who think > > they're in danger of losing data. I suspect the difficulty of a tool > > like this will be more in the communication of issues and severity, > > more than in what exactly we choose to check. > > Frankly I've never been a big fan of how we report warnings like this through > the health check. It's important to let users know if they've set up things > sub-optimally, but I don't think ceph health is the way to do it. The > difference between your doctor telling you you should exercise more and > lose a few pounds vs you have Ebola and are going to suffer an incredibly > gruesome and painful death in the next 48 hours. :) > Since I was the one at the SDI Summit that took issue with some of these warnings, I whole-heartedly agree with Greg's and Mark's comments. A warning at health check should indicate to the user that some corrective action should be taken, besides turning the warning off :-) I do not have an issue reporting advisories, but they should be kept separate true warnings. If we want to notify the user of variances from best practices, I suggest a separate method, i.e. "ceph advise", rather than constantly repeating them on health checks. > > -Greg > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > > in the body of a message to majordomo@xxxxxxxxxxxxxxx More > majordomo > > info at http://vger.kernel.org/majordomo-info.html > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the > body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at > http://vger.kernel.org/majordomo-info.html ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f