On Mon, 2019-09-30 at 15:30 +0100, Daniel P. Berrangé wrote: > On Mon, Sep 30, 2019 at 10:55:00AM +0200, Martin Kletzander wrote: > > Given the fact that most of these could just be virFileReadValueUint() it does > > not even make it easier to read or write the code. > > Errr, virFileReadValueUint reads a single integer from a file. > > This is extracting a single *word* from a specific field in a > file. > > Doing this in code would involve doing a regex match the same as > here. > > > Every time someone will want to add a new check or a fact they will need to find > > a similar one, copy-paste it, change it and hope for the best. This introduces > > yet another "language" on top of the two you are adding already. I really do > > not see any benefit in this. > > Taking this particular example. > > - name: cpu.vulnerability.l1tf_smt > filter: > fact: > name: os.kernel > value: Linux > value: > file: > path: /sys/devices/system/cpu/vulnerabilities/l1tf > ignoreMissing: true > parse: > scalar: > regex: SMT (\w+) > match: 1 > > It could be approximately equiv to code > > if runtime.GOOS == "Linux" { > return "", nil > } > > data, err = ioutils.ReadFile("/sys/devices/system/cpu/vulnerabilities/l1tf") > if err != nil { > if os.IsNotExist(err) { > return "", nil > } > return "", err > } > > match, err := regexp.Find(`SMT (\w+)`, data) > if err != nil { > return "", err > } > err = SetFact("cpu.vulnerability.l1tf_smt", match) > if err != nil { > return "", err > } > > The code does look more familiar, but it is certainly not simpler. > Most notabley there are alot of error handling cases here that need > to be dealt with, even when written in Go. Doing this in C with its > string handling support involves even more error handling. This > is a significant maintainence burden that is not well handled in > a consistent manner across the code. > > This is a fairly simple check example, case where we want to extract > data from a more more complex file formats result in much more code > complexity. > > In the pure code approach, we have mixed up code related to reading > files, running commands, parsing data formats, setting facts, printing > messages, skipping execution when certain other conditions applied. > Hidden in this is the information about what feature we are actually > checking for. > > A data driven approach to this problem means that we get clear separation > between distinct parts/jobs. > > The engine written in code concerns itself with generic facilities for > reading files, running commands, parsing data, and all the error handling > and data processing. > > Thereafter the interesting checks is simply a declarative expression of > what data we wish to extract for each fature, and declarative expression > of dependancies between features. > > This engine code has greater complexity than what exists today for sure, > but the key thing is that this is mostly a one time upfront code, and is > the least interesting part of the tool. What is most interest is the set > of features being checked for and the relationships between the features. > > Having the declarative part separated from the functional part is where > the value arises. > > - Changes to the functional code apply consistently across all > the checks made. For example, an improvement to error handling > to regex processing was done once and applied to all the checks > that used regex. Or consistently handling when reading files > that are potentially missing on systems. > > - New checks can be introduced in a more reliable and consistent > manner. These CPU vulnerability checks are an example of this > in practice. We merely have to express what files we're interested > in and how to match the relevant data in them. No need to worry > about reading files, compiling & running regexs, making sure the > error handling was done in the same way as other existing checks. All of the above mostly sounds like "use functions" to me. Something like value, err := GetIntFactFromFile( "/sys/devices/system/cpu/vulnerabilities/l1tf", "SMT (\w+)", OsLinux, IgnoreMissing, ) if err != nil { return err } SetIntFact("cpu.vulnerability.l1tf_smt", value) requires basically as much upfront knowledge to understand as the YAML version, but doesn't introduce a new DSL. And hey, it's shorter than either version! :) > - The declarative data can be processed algorithmically. For example > given the depedancies expressed between the facts, I saw that it > was not neccessary to define all the facts in the same data file. > It was possible split them across many files, load those files in > any order, and algorithmically determine exactly which subset of > checks need to be executed & in what order. > > - Having separated data from the code it is obviously possible to > extend this without introducing any code changes. This is possible > to use from outside libvirt. For example there are usage scenarios > which may not be supported by certain versions of QEMU. The QEMU > package can drop in some facts which will be picked up by the > virt-host-validate tool. Alternatively going up the stack, an > app like OpenStack which uses libvirt may wish to restrict what > they use, or wish to check extra virt host features interesting > to their app's usage of libvirt. Again their package can now > drop in extra facts. This last point is the only advantage I can see very clearly. I'm not sure it's worth introducing a specific format for, though. > - New features can be built ontop of the declarative data. At first > I just set out to replicate the existing tool's output as is. Once > that was done, it was obvious that we could trivially add a new > feature allowing us to print the raw informationm about each > fact that was checked, as an alternative to the human targetted > message strings. IOW we got a machine readable output format > essentially for free by virtue of using declarative data. This is a benefit of storing the information as facts, which as I said before I think is a good idea; it doesn't mean we have to obtain said facts by processing YAML files. > - The implementation can be rewritten again at will. If the original > C code had this split of data vs processing logic, this conversion > of langauge would have been even more compelling, as none of othe > declarative data would have needed changing. I'm not expecting > another rewrite, but this capability was already valuable, between > the v1 and v2 posting of this patch I changed from XML to YAML for > the data format. This conversion was entirely automated, because > it could be algorithmically processed. If you had used code instead of a pure data format as the source for information, then you wouldn't have needed to perform that conversion in the first place O:-) > These are all typical & expected benefits that arise from separating > functional logic from declarative data. In the long term this is a > clear win, & as I mentioned I would have done this sooner had it not > been for our use of C. I expect that if you would have posted patches introducing this machinery to the C version of virt-host-validate you would have seen more or less the same reactions. > > If I was to pick a new feature we could benefit from, I would much rather prefer > > having an opt-in for report-home of HW features and usage for some very rough > > anonymous statistics. > > I don't think we want to get involved in reporting user data back to any > server, as it is a data privacy minefield. This tool can be used by other > existing reporting tools though - eg Red Hat distros often include the > 'sosreport' tool that bundles stuff up for reporting customer tickets. Wholeheartedly agreed. -- Andrea Bolognani / Red Hat / Virtualization -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list