"pre-flight" checks is a common ask and seen as a way to avoid misconfiguration and bad deployments - so this great to see. Would like to know more! The UI looks pretty cool - be great to understand how you see the preflight work fitting in to a deployment workflow. Cheers! On Fri, Sep 7, 2018 at 4:28 AM Erwan Velu <evelu@xxxxxxxxxx> wrote: > > I have a much more evil plan that you force me to unveil ;))) > > I'm considering using the skydive project : http://skydive.network/ > > It have all I need to perform analysis & reporting, its distributed, it's done by a Redhat people I know and there are happy about my usage to enhance the tool. > Adding hw reporting + storage will be very easy, the UI is nice & dynamic (adding lldp will be a killer feature), they have network replay done (so we could replay a real ceph traffic as a reproducible test case) and adding new testing will be easy too. > > Looks like a good place to start ;) > > ----- Mail original ----- > De: "John Spray" <jspray@xxxxxxxxxx> > À: "Erwan Velu" <evelu@xxxxxxxxxx> > Cc: "Ceph Development" <ceph-devel@xxxxxxxxxxxxxxx> > Envoyé: Jeudi 6 Septembre 2018 18:07:54 > Objet: Re: Presenting the pre-flight checks project > > On Thu, Sep 6, 2018 at 4:37 PM Erwan Velu <evelu@xxxxxxxxxx> wrote: > > > > Hi fellows, > > > > I've been thinking about it for a long while and had a chance to pitch that idea during the Mountpoint.io event. > > I think it's time to share it will all of you, present the idea & concepts to get your feedback on it. > > > > Deploying Ceph, but generally speaking any distributed software, means having a software running on a given set of nodes to gain a particular service : storage in our case. > > > > But what is the confidence level of people deploying it, that the platform is performing well before getting Ceph on it ? > > How much of the raw performance are you really using ? > > How far are you from what the platform is capable of ? > > Do you have any disk/interface/<place here any hardware device>/ slowing down the whole infra ? > > > > I'm pretty sure that people operating Ceph have usually no answer to that questions and the classical one is "it works good enough so no-one complains" or "someone prepared it, I trust what he did". > > > > And what would you do if someone says : "That's pretty curious, the Ceph cluster seems slower since /a couple of days/kernel upgrade/<place any reason here/. > > I'm still pretty sure that making the split between Ceph & platform responsibilities is almost impossible for many. > > > > There is were the project is starting. > > > > What should be the set of pre-flight checks to insure the platform doesn't have any mis-configuration or even damaged devices to deliver a good distributed service. > > > > To my understanding of that topic, the tool should: > > - be lightweight to be easily installed on hosts > > - application agnostic so it could be used for any distributed software : ceph-medic was made for detecting bad ceph's configuration while this tool will be focused on the platform > > - check status of network / storage / cpu / ram (bandwidth, latency, any specific metric) > > - generate some loads (network / storage / cpu / ram) to see the impact of one component to the whole platform > > - detect non-homogenous results / configuration (meaning that if a set of node is said to be identical, it have to be) > > - offer a good interface so everyone can use it > > - automated as its most to avoid complex cli/options/tuning to gain a good result > > - allow comparing results over time to analyze how much the platform changed over-time (install time vs incident time) > > > > I do think this will be helpful for > > - users > > - admins > > - support teams (bug triage, support level 1/2/3) > > - infra people that setup hw configurations > > - performance people > > > > I'm sending this email to the Ceph project because > > - I'm working on this beautiful software > > - Ceph's performance is very dependent from the quality of the platform, > > - I think it's the right place to bootstrap that project > > > > If you have some interest in that project, feel free to reply to this email and let's do it ! > > Cool! Perhaps this could take over the ceph-medic codebase as a > starting point, as that hasn't had any commits for a long time, and > there is some overlap in scope. > > John > > > Erwan,