On 7/26/24 11:45, Rouven Seifert wrote:
Hello,
On 2024-07-25 16:39, Harry G Coin wrote:
Upgraded to 18.2.4 yesterday. Healthy cluster reported a few minutes
after the upgrade completed. Next morning, this:
# ceph health detail
HEALTH_ERR Module 'diskprediction_local' has failed: No module named
'sklearn'
[ERR] MGR_MODULE_ERROR: Module 'diskprediction_local' has failed: No
module named 'sklearn'
Module 'diskprediction_local' has failed: No module named 'sklearn'
Searching found this was a problem several years ago, then resolved,
now returned.
We encountered the same problem after an upgrade on our cluster and I
dug a bit into this. It appears that [0] was the fix for the missing
sklearn package back in 2021. That fix was seemingly specifically tied
to centos 8.
Now that the container images are being built on centos 9, the
relevant Dockerfile doesn't include the fix any more as it checks the
OS version for centos 8. I wonder a bit why it was done this way.
That problem in relation to centos 9 seems to be known to the
ceph-container managers. See for example [1].
[0] https://github.com/ceph/ceph-container/pull/1821/files
[1]
https://github.com/ceph/ceph-container/blob/main/ceph-releases/ALL/centos/9/daemon-base/README.tmp
Best regards,
Rouven
Thanks! I think there's a further issue as well. The
diskprediction_local code appears to be hard-coded to a specific
version: scikit-learn==0.19.2. Something to do with included class
libraries in 0.19.2 no longer part of later versions. I tried to
compile that version in rhel/centos9 but I couldn't get the version of
mkl_rt to compile. Whoever it is that's the maintainer of
diskprediction_local has just a little bit of work to do to adapt it to
the latest scikit-learn rev.
Best Regards,
Harry Coin
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx