RE: About separate the diskprediction plugin

Rick Chen <rick.chen@xxxxxxxxxxxxxxx> · Fri, 26 Oct 2018 01:41:01 +0000

HI Sage:
I got it, Thanks your clear advice.
I think I can start to separate the diskprediction plugin.

> -----Original Message-----
> From: Sage Weil <sage@xxxxxxxxxxxx>
> Sent: Friday, October 26, 2018 3:32 AM
> To: Rick Chen <rick.chen@xxxxxxxxxxxxxxx>
> Cc: Sheng-Lin Wu <shenglin.wu@xxxxxxxxxxxxxxx>; 'Jeremy Wei'
> <jeremycwei@xxxxxxxxx>; ceph-devel@xxxxxxxxxxxxxxx
> Subject: RE: About separate the diskprediction plugin
> 
> On Thu, 25 Oct 2018, Rick Chen wrote:
> > Hi Sage:
> > 	Thank your feedback.
> > 	Below is my understanding, I have last one question "*Q" that need your
> advice.
> > 	Do I have any omission? Please let me know.
> >
> > - devicehealth: (act the device health manger)
> > 	Handle configuration, and control diskprediction_local and
> diskprediction_cloud start.
> 
> See https://github.com/ceph/ceph/pull/24755 for the config option piece of
> this.
> 
> > 		Use ceph cluster configuration to store devicehealth seting.
> > 		* New PR will handle.
> > 		diskprediction_* scraped by the devicehealth
> > 		*Q: Current mgr plugin enabled by ceph mgr module, it cannot
> enabled or triggered by another plugin. How to communicate with both plugin?
> Do both plugin default enabled? And let it to be api daemon to receive
> devicehealth scrape so the devicehealth can receive prediction result from
> both plugin?
> 
> For the moment, we can manually enable the right one.  Probably we want
> the devicehealth to look at the config setting and enable the right module for
> you, though (and disable any inactive ones).  We can do that a bit later.
> 
> We can make python calls between modules with self.remote(), and
> devicehealth will know which is active, so it can remote into the correct
> module to do whatever operation it wants...
> 
> > - diskprediction_local (act the device predictor like as job executor)
> > 	Generate prediction data by devicehealth plugin notify.
> >
> > - diskprediction_cloud(act the device predictor like as job executor)
> >  	# But it should has post metrices interval time and control by itself. The
> metrices data does not only based on the devicehealth plugin provided.
> Because the cloud need more data to do analysis and the cloud server data
> display and based on it's condition.
> > 	Get prediction data by devicehealth plugin notify.
> 
> Yeah, I think this one would have a serve() method that does the scraping of
> (non-smart) metrics at the short intervals.  It can ignore the device metric
> scraping and let the normal piece do that part, and only deal with the pushing
> of those metrics to the cloud service on demand.
> 
> Is that reasonable, or is there an alternative approach that makes more sense?
> 
> sage
> 
> 
> >
> >
> >
> > > > The devicehealth loads prediction_mode config value, it mean the
> > > > user use devicehealth to config prediction_mode and argements. How
> > > > the devicehealth_local and devicehealth_clould access this plugin
> > > > stored configuration? Does these plugins access the same mgr store
> value?
> > >
> > > I think we should make this a global ceph option, not a mgr-specific
> > > option, so that users set it via a more familiar 'ceph config set
> > > device_failure_prediction_mode local'.  I can push a PR with this
> > > part of it as IIRC there is a missing mgr_module method to access the
> cluster config.
> > Great.
> >
> > >
> > > > - generic function to get a prediction for agiven device, that calls into
> > > >    the enabled module via self.remote()
> > > >    - called by 'device predict-life-expectancy'
> > > > Does it related on the which devicehealth_* enabled? Right.
> > >
> > > Right
> > >
> > > > This approach did not automatic set device life expectancy day
> > > > description. Does it still keep on each devicehealth_* plugin?
> > >
> > > I can't decide if it's useful to have both variants or not (one that
> > > just calculates a prediction and shows you, vs one that also stores it).
> > > Either way, I think both commands would live in devicehealth and
> > > remote() into the enabled module to get the prediction, so the
> > > prediction module doesn't have to worry about storing at all.
> > >
> > > > Current cloud plugin push metrices as below:
> > > > 	Performance metrices per 10 minutes that include ceph cluster
> > > > status/
> > > ceph each object correlation / osd performance counter.
> > > > 	Device smart data metrics per 12 hours that related on the
> > > > devicehealth
> > > shared metrics.
> > > > Current could plugin get device life expectance day from the cloud
> > > > per 12
> > > hours.
> > >
> > > Perhaps something like this:
> > >
> > >  1- devicehealth already has a health metrics scrape interval.  let it
> > >     scrape as it already does.
> > >  2- once it has scraped a device's metrics, it can remote() into the
> > >     enabled module to notify it that there are fresh metrics available.
> > >     - the cloud module could then make an API to push the latest values.
> > >       the local module would do nothing from this hook.
> > >  3- later, devicehealth would refresh its life expectancies by calling
> > >     into the prediction module for each device.  the cloud module
> would
> > >     make it's API call then to get a new prediction.
> > >
> > > The #2 step isn't strictly needed in the above, since the module
> > > could push the latest (or even all) metrics as part of #3 when it is
> > > asked for a prediction; up to you!
> > >
> > > sage
> > >
> > >
> > >
> > > >
> > > > -----Original Message-----
> > > > From: Sage Weil <sage@xxxxxxxxxxxx>
> > > > Sent: Tuesday, October 23, 2018 8:14 PM
> > > > To: Rick Chen <rick.chen@xxxxxxxxxxxxxxx>
> > > > Cc: Sheng-Lin Wu <shenglin.wu@xxxxxxxxxxxxxxx>
> > > > Subject: Re: About separate the diskprediction plugin
> > > >
> > > > On Tue, 23 Oct 2018, Rick Chen wrote:
> > > > > Hi Sage:
> > > > > Do you have any suggestion about the separate diskprediction task?
> > > > > Do we separate diskprediction_cloud and diskprediction_local to
> > > > > individual plugin? Or separate the local predictor and integrate
> > > > > with the devicehealth plugin. And does both plugin work
> simultaneously?
> > > >
> > > > I suspect the best approach is something like:
> > > >
> > > > devicehealth
> > > >  - shared metrics
> > > >  - loads prediction_mode config value
> > > >  - later: something to auto-enable the right devicehealth_* module
> > > >  - generic function to get a prediction for agiven device, that calls into
> > > >    the enabled module via self.remote()
> > > >    - called by 'device predict-life-expectancy'
> > > >
> > > > devicehealth_local
> > > >  - implement the predict method for a device w/ sklearn models
> > > >
> > > > devicehealth_cloud
> > > >  - addition metrics gathering
> > > >  - calls out to cloud to publish metrics
> > > >  - implement the predict method for a device by making call to
> > > > cloud
> > > >
> > > > Does that work?  I'm not completely clear what the current status
> > > > of the
> > > cloud mode is with the metrics publish vs query to get life expectancy.
> > > > If they're separate calls, I think the above makes sense?
> > > >
> > > > sage
> > > >
> > > >
> > > >
> > > > >
> > > > > Current block diagram for you reference.
> > > > > [cid:image002.png@01D46AC6.AB38EB10]
> > > > >
> > > > >
> > > [https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-oran
> > > ge-ani
> > >
> mated-no-repeat-v1.gif]<https://www.avast.com/sig-email?utm_medium=e
> > > m
> > >
> ail&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
> > > 不含病毒。
> > >
> www.avast.com<https://www.avast.com/sig-email?utm_medium=email&utm
> > > _source=link&utm_campaign=sig-email&utm_content=emailclient>
> > > > >
> > > >
> > > >
> > > > ---
> > > > Avast 防毒軟體已檢查此封電子郵件的病毒。
> > > > https://www.avast.com/antivirus
> > > >
> > > >
> >
> >
> > ---
> > Avast 防毒軟體已檢查此封電子郵件的病毒。
> > https://www.avast.com/antivirus
> >
> >

---
Avast 防毒軟體已檢查此封電子郵件的病毒。
https://www.avast.com/antivirus