Re: Presenting the pre-flight checks project

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Glad to see some needs this tooling !

I'll make a presentation once the POC will be ready.
I submitted for making a talk about it in Berlin ;)

Cheers,

----- Mail original -----
De: "Paul Cuzner" <pcuzner@xxxxxxxxxx>
À: "Erwan Velu" <evelu@xxxxxxxxxx>
Cc: "John Spray" <jspray@xxxxxxxxxx>, "Ceph Development" <ceph-devel@xxxxxxxxxxxxxxx>
Envoyé: Vendredi 7 Septembre 2018 05:21:29
Objet: Re: Presenting the pre-flight checks project

"pre-flight"  checks is a common ask and seen as a way to avoid
misconfiguration and bad deployments - so this great to see.

Would like to know more! The UI looks pretty cool - be great to
understand how you see the preflight work fitting in to a deployment
workflow.

Cheers!

On Fri, Sep 7, 2018 at 4:28 AM Erwan Velu <evelu@xxxxxxxxxx> wrote:
>
> I have a much more evil plan that you force me to unveil ;)))
>
> I'm considering using the skydive project : http://skydive.network/
>
> It have all I need to perform analysis & reporting, its distributed, it's done by a Redhat people I know and there are happy about my usage to enhance the tool.
> Adding hw reporting + storage will be very easy, the UI is nice & dynamic (adding lldp will be a killer feature), they have network replay done (so we could replay a real ceph traffic as a reproducible test case) and adding new testing will be easy too.
>
> Looks like a good place to start ;)
>
> ----- Mail original -----
> De: "John Spray" <jspray@xxxxxxxxxx>
> À: "Erwan Velu" <evelu@xxxxxxxxxx>
> Cc: "Ceph Development" <ceph-devel@xxxxxxxxxxxxxxx>
> Envoyé: Jeudi 6 Septembre 2018 18:07:54
> Objet: Re: Presenting the pre-flight checks project
>
> On Thu, Sep 6, 2018 at 4:37 PM Erwan Velu <evelu@xxxxxxxxxx> wrote:
> >
> > Hi fellows,
> >
> > I've been thinking about it for a long while and had a chance to pitch that idea during the Mountpoint.io event.
> > I think it's time to share it will all of you, present the idea & concepts to get your feedback on it.
> >
> > Deploying Ceph, but generally speaking any distributed software, means having a software running on a given set of nodes to gain a particular service : storage in our case.
> >
> > But what is the confidence level of people deploying it, that the platform is performing well before getting Ceph on it ?
> > How much of the raw performance are you really using ?
> > How far are you from what the platform is capable of ?
> > Do you have any disk/interface/<place here any hardware device>/ slowing down the whole infra ?
> >
> > I'm pretty sure that people operating Ceph have usually no answer to that questions and the classical one is "it works good enough so no-one complains" or "someone prepared it, I trust what he did".
> >
> > And what would you do if someone says : "That's pretty curious, the Ceph cluster seems slower since /a couple of days/kernel upgrade/<place any reason here/.
> > I'm still pretty sure that making the split between Ceph & platform responsibilities is almost impossible for many.
> >
> > There is were the project is starting.
> >
> > What should be the set of pre-flight checks to insure the platform doesn't have any mis-configuration or even damaged devices to deliver a good distributed service.
> >
> > To my understanding of that topic, the tool should:
> > - be lightweight to be easily installed on hosts
> > - application agnostic so it could be used for any distributed software : ceph-medic was made for detecting bad ceph's configuration while this tool will be focused on the platform
> > - check status of network / storage / cpu / ram (bandwidth, latency, any specific metric)
> > - generate some loads (network / storage / cpu / ram) to see the impact of one component to the whole platform
> > - detect non-homogenous results / configuration (meaning that if a set of node is said to be identical, it have to be)
> > - offer a good interface so everyone can use it
> > - automated as its most to avoid complex cli/options/tuning to gain a good result
> > - allow comparing results over time to analyze how much the platform changed over-time (install time vs incident time)
> >
> > I do think this will be helpful for
> > - users
> > - admins
> > - support teams (bug triage, support level 1/2/3)
> > - infra people that setup hw configurations
> > - performance people
> >
> > I'm sending this email to the Ceph project because
> > - I'm working on this beautiful software
> > - Ceph's performance is very dependent from the quality of the platform,
> > - I think it's the right place to bootstrap that project
> >
> > If you have some interest in that project, feel free to reply to this email and let's do it !
>
> Cool!  Perhaps this could take over the ceph-medic codebase as a
> starting point, as that hasn't had any commits for a long time, and
> there is some overlap in scope.
>
> John
>
> > Erwan,



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux