On Mon, Oct 1, 2018 at 8:26 PM Shengjing Zhu <zhsj@xxxxxxxxxx> wrote: > > On Mon, Oct 01, 2018 at 06:59:45PM +0800, kefu chai wrote: > > i am wondering if we could move further by providing user the > > pre-labeled SMART dataset of all listed combination of SMART > > attributes combination in config.json , script and document for > > training them, if only commodity hardware and free software are > > required to process the dataset. > > IMHO, this usually needs proprietary software like CUDA, maybe we're not > in the free/libre era for machine learning yet... i believe the SVM classifiers come with pybind/mgr/diskprediction were trained using sklearn.svm.SVC[0], which in turn is implemented using libsvm[1,2]. libsvm can be rewrite to take advantage of GPU using techniques like CUDA. but i don't think it's a must. and it is using a single classifier for labelling the positive samples. since the models were trained using SVC, it's a O(n^2) algorithm, where n is the size of dataset. so it should be acceptable for performing the training without GPU. --- [0] http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html [1] https://github.com/scikit-learn/scikit-learn/tree/master/sklearn/svm/src/libsvm [2] https://www.csie.ntu.edu.tw/~cjlin/libsvm/ > > On Mon, Oct 1, 2018 at 7:37 PM John Spray <jspray@xxxxxxxxxx> wrote: > > If we consider these files to be software, then it's correct to say > > that a public domain binary is non-free. If we consider them data, > > then a public domain binary is just a piece of data (analogous to > > distributing a .jpeg file but not the photographer's original .raw > > file). I would lean toward the second view -- in my view, machine > > learning datasets are not source code, as they're numeric data rather > > than computer instructions. > > > > My point is whether user can modify it. For picture, without .raw, we > can still modify the .jpeg with GIMP. For font(usually another case), we > can modify .ttf/.otf with fontforge. Both GIMP and fontforge are free > softwares. > > Is it possible to use free software to modify the .joblib files? With my > silly knowledge of machine learning, I see scikit-learn can only load > and use them? in addition to consuming them, scikit-learn can be used to create models. and as i explain in another mail in this thread, user, even professional, is not supposed to tweak the model manually. > > -- > Shengjing Zhu -- Regards Kefu Chai