HI Sage: I got it, Thanks your clear advice. I think I can start to separate the diskprediction plugin. > -----Original Message----- > From: Sage Weil <sage@xxxxxxxxxxxx> > Sent: Friday, October 26, 2018 3:32 AM > To: Rick Chen <rick.chen@xxxxxxxxxxxxxxx> > Cc: Sheng-Lin Wu <shenglin.wu@xxxxxxxxxxxxxxx>; 'Jeremy Wei' > <jeremycwei@xxxxxxxxx>; ceph-devel@xxxxxxxxxxxxxxx > Subject: RE: About separate the diskprediction plugin > > On Thu, 25 Oct 2018, Rick Chen wrote: > > Hi Sage: > > Thank your feedback. > > Below is my understanding, I have last one question "*Q" that need your > advice. > > Do I have any omission? Please let me know. > > > > - devicehealth: (act the device health manger) > > Handle configuration, and control diskprediction_local and > diskprediction_cloud start. > > See https://github.com/ceph/ceph/pull/24755 for the config option piece of > this. > > > Use ceph cluster configuration to store devicehealth seting. > > * New PR will handle. > > diskprediction_* scraped by the devicehealth > > *Q: Current mgr plugin enabled by ceph mgr module, it cannot > enabled or triggered by another plugin. How to communicate with both plugin? > Do both plugin default enabled? And let it to be api daemon to receive > devicehealth scrape so the devicehealth can receive prediction result from > both plugin? > > For the moment, we can manually enable the right one. Probably we want > the devicehealth to look at the config setting and enable the right module for > you, though (and disable any inactive ones). We can do that a bit later. > > We can make python calls between modules with self.remote(), and > devicehealth will know which is active, so it can remote into the correct > module to do whatever operation it wants... > > > - diskprediction_local (act the device predictor like as job executor) > > Generate prediction data by devicehealth plugin notify. > > > > - diskprediction_cloud(act the device predictor like as job executor) > > # But it should has post metrices interval time and control by itself. The > metrices data does not only based on the devicehealth plugin provided. > Because the cloud need more data to do analysis and the cloud server data > display and based on it's condition. > > Get prediction data by devicehealth plugin notify. > > Yeah, I think this one would have a serve() method that does the scraping of > (non-smart) metrics at the short intervals. It can ignore the device metric > scraping and let the normal piece do that part, and only deal with the pushing > of those metrics to the cloud service on demand. > > Is that reasonable, or is there an alternative approach that makes more sense? > > sage > > > > > > > > > > > > The devicehealth loads prediction_mode config value, it mean the > > > > user use devicehealth to config prediction_mode and argements. How > > > > the devicehealth_local and devicehealth_clould access this plugin > > > > stored configuration? Does these plugins access the same mgr store > value? > > > > > > I think we should make this a global ceph option, not a mgr-specific > > > option, so that users set it via a more familiar 'ceph config set > > > device_failure_prediction_mode local'. I can push a PR with this > > > part of it as IIRC there is a missing mgr_module method to access the > cluster config. > > Great. > > > > > > > > > - generic function to get a prediction for agiven device, that calls into > > > > the enabled module via self.remote() > > > > - called by 'device predict-life-expectancy' > > > > Does it related on the which devicehealth_* enabled? Right. > > > > > > Right > > > > > > > This approach did not automatic set device life expectancy day > > > > description. Does it still keep on each devicehealth_* plugin? > > > > > > I can't decide if it's useful to have both variants or not (one that > > > just calculates a prediction and shows you, vs one that also stores it). > > > Either way, I think both commands would live in devicehealth and > > > remote() into the enabled module to get the prediction, so the > > > prediction module doesn't have to worry about storing at all. > > > > > > > Current cloud plugin push metrices as below: > > > > Performance metrices per 10 minutes that include ceph cluster > > > > status/ > > > ceph each object correlation / osd performance counter. > > > > Device smart data metrics per 12 hours that related on the > > > > devicehealth > > > shared metrics. > > > > Current could plugin get device life expectance day from the cloud > > > > per 12 > > > hours. > > > > > > Perhaps something like this: > > > > > > 1- devicehealth already has a health metrics scrape interval. let it > > > scrape as it already does. > > > 2- once it has scraped a device's metrics, it can remote() into the > > > enabled module to notify it that there are fresh metrics available. > > > - the cloud module could then make an API to push the latest values. > > > the local module would do nothing from this hook. > > > 3- later, devicehealth would refresh its life expectancies by calling > > > into the prediction module for each device. the cloud module > would > > > make it's API call then to get a new prediction. > > > > > > The #2 step isn't strictly needed in the above, since the module > > > could push the latest (or even all) metrics as part of #3 when it is > > > asked for a prediction; up to you! > > > > > > sage > > > > > > > > > > > > > > > > > -----Original Message----- > > > > From: Sage Weil <sage@xxxxxxxxxxxx> > > > > Sent: Tuesday, October 23, 2018 8:14 PM > > > > To: Rick Chen <rick.chen@xxxxxxxxxxxxxxx> > > > > Cc: Sheng-Lin Wu <shenglin.wu@xxxxxxxxxxxxxxx> > > > > Subject: Re: About separate the diskprediction plugin > > > > > > > > On Tue, 23 Oct 2018, Rick Chen wrote: > > > > > Hi Sage: > > > > > Do you have any suggestion about the separate diskprediction task? > > > > > Do we separate diskprediction_cloud and diskprediction_local to > > > > > individual plugin? Or separate the local predictor and integrate > > > > > with the devicehealth plugin. And does both plugin work > simultaneously? > > > > > > > > I suspect the best approach is something like: > > > > > > > > devicehealth > > > > - shared metrics > > > > - loads prediction_mode config value > > > > - later: something to auto-enable the right devicehealth_* module > > > > - generic function to get a prediction for agiven device, that calls into > > > > the enabled module via self.remote() > > > > - called by 'device predict-life-expectancy' > > > > > > > > devicehealth_local > > > > - implement the predict method for a device w/ sklearn models > > > > > > > > devicehealth_cloud > > > > - addition metrics gathering > > > > - calls out to cloud to publish metrics > > > > - implement the predict method for a device by making call to > > > > cloud > > > > > > > > Does that work? I'm not completely clear what the current status > > > > of the > > > cloud mode is with the metrics publish vs query to get life expectancy. > > > > If they're separate calls, I think the above makes sense? > > > > > > > > sage > > > > > > > > > > > > > > > > > > > > > > Current block diagram for you reference. > > > > > [cid:image002.png@01D46AC6.AB38EB10] > > > > > > > > > > > > > [https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-oran > > > ge-ani > > > > mated-no-repeat-v1.gif]<https://www.avast.com/sig-email?utm_medium=e > > > m > > > > ail&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> > > > 不含病毒。 > > > > www.avast.com<https://www.avast.com/sig-email?utm_medium=email&utm > > > _source=link&utm_campaign=sig-email&utm_content=emailclient> > > > > > > > > > > > > > > > > > --- > > > > Avast 防毒軟體已檢查此封電子郵件的病毒。 > > > > https://www.avast.com/antivirus > > > > > > > > > > > > > > --- > > Avast 防毒軟體已檢查此封電子郵件的病毒。 > > https://www.avast.com/antivirus > > > > --- Avast 防毒軟體已檢查此封電子郵件的病毒。 https://www.avast.com/antivirus