RE: About separate the diskprediction plugin

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 24 Oct 2018, Rick Chen wrote:
> HI Sage:
> 
> The devicehealth loads prediction_mode config value, it mean the user 
> use devicehealth to config prediction_mode and argements. How the 
> devicehealth_local and devicehealth_clould access this plugin stored 
> configuration? Does these plugins access the same mgr store value?

I think we should make this a global ceph option, not a mgr-specific 
option, so that users set it via a more familiar 'ceph config set 
device_failure_prediction_mode local'.  I can push a PR with this part of 
it as IIRC there is a missing mgr_module method to access the cluster 
config.

> - generic function to get a prediction for agiven device, that calls into 
>    the enabled module via self.remote()
>    - called by 'device predict-life-expectancy'
> Does it related on the which devicehealth_* enabled? Right.

Right

> This approach did not automatic set device life expectancy day 
> description. Does it still keep on each devicehealth_* plugin?

I can't decide if it's useful to have both variants or not (one that just 
calculates a prediction and shows you, vs one that also stores it).  
Either way, I think both commands would live in devicehealth and 
remote() into the enabled module to get the prediction, so the prediction 
module doesn't have to worry about storing at all.
 
> Current cloud plugin push metrices as below:
> 	Performance metrices per 10 minutes that include ceph cluster status/ ceph each object correlation / osd performance counter.
> 	Device smart data metrics per 12 hours that related on the devicehealth shared metrics.
> Current could plugin get device life expectance day from the cloud per 12 hours.

Perhaps something like this:

 1- devicehealth already has a health metrics scrape interval.  let it 
    scrape as it already does.
 2- once it has scraped a device's metrics, it can remote() into the 
    enabled module to notify it that there are fresh metrics available.
    - the cloud module could then make an API to push the latest values. 
      the local module would do nothing from this hook.
 3- later, devicehealth would refresh its life expectancies by calling 
    into the prediction module for each device.  the cloud module would 
    make it's API call then to get a new prediction.

The #2 step isn't strictly needed in the above, since the module could 
push the latest (or even all) metrics as part of #3 when it is asked for a 
prediction; up to you!

sage

 

> 
> -----Original Message-----
> From: Sage Weil <sage@xxxxxxxxxxxx> 
> Sent: Tuesday, October 23, 2018 8:14 PM
> To: Rick Chen <rick.chen@xxxxxxxxxxxxxxx>
> Cc: Sheng-Lin Wu <shenglin.wu@xxxxxxxxxxxxxxx>
> Subject: Re: About separate the diskprediction plugin 
> 
> On Tue, 23 Oct 2018, Rick Chen wrote:
> > Hi Sage:
> > Do you have any suggestion about the separate diskprediction task?
> > Do we separate diskprediction_cloud and diskprediction_local to 
> > individual plugin? Or separate the local predictor and integrate with 
> > the devicehealth plugin. And does both plugin work simultaneously?
> 
> I suspect the best approach is something like:
> 
> devicehealth
>  - shared metrics
>  - loads prediction_mode config value
>  - later: something to auto-enable the right devicehealth_* module
>  - generic function to get a prediction for agiven device, that calls into 
>    the enabled module via self.remote()
>    - called by 'device predict-life-expectancy'
> 
> devicehealth_local
>  - implement the predict method for a device w/ sklearn models 
> 
> devicehealth_cloud
>  - addition metrics gathering
>  - calls out to cloud to publish metrics
>  - implement the predict method for a device by making call to cloud
> 
> Does that work?  I'm not completely clear what the current status of the cloud mode is with the metrics publish vs query to get life expectancy.  
> If they're separate calls, I think the above makes sense?
> 
> sage
> 
> 
> 
> > 
> > Current block diagram for you reference.
> > [cid:image002.png@01D46AC6.AB38EB10]
> > 
> > [https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif]<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>      不含病毒。www.avast.com<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
> > 
> 
> 
> ---
> Avast 防毒軟體已檢查此封電子郵件的病毒。
> https://www.avast.com/antivirus
> 
> 

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux