hi guys, i noticed that Shengjing raised his concern regarding to the .joblib files introduced along with the diskprediction plugin[0,1]. these data files are released under public domain. and because the source of these files are not released at this moment, he argued that "this is still not free". i agree, to some degree, it's arguable that they're not free as in the sense of "free software", or compliant to DFSG[2] to be specific, but i believe the license is valid per se. i am wondering if we could move further by providing user the pre-labeled SMART dataset of all listed combination of SMART attributes combination in config.json , script and document for training them, if only commodity hardware and free software are required to process the dataset. so they are accessible to the public. and these dataset can be DFSG-free in this way? see tesseract-ocr[3] as an example. i know, there are some of discussions[4] regarding to the freedom versus machine learning models. but in our case, i think it's much simpler, because, unlike the dataset used by image/speech recognition, the scale/size of SMART attributes are much smaller than video/audio sequences, neither are they likely contain user data. i think it's even an opportunity for our user to train the dataset or label a good/bad disk, and to transit from a user to a contributor by contributing to the dataset. what do you think? cheers, -- [0] https://github.com/ceph/ceph/pull/22239 [1] https://github.com/ceph/ceph/pull/24104 [2] https://www.debian.org/social_contract#guidelines [3] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=699609 and https://github.com/tesseract-ocr/langdata [4] https://lwn.net/Articles/760142/ -- Regards Kefu Chai