On Wed, Mar 25, 2020 at 03:11:40PM +0800, Shi Lei wrote: > Outline > ========= > > In libvirt world, objects(like domain, network, etc.) are described with two > representations: structures in c-language and XML specified by relax-ng. > Since c-language-implementation and xml restricted by relax-ng are merely > the different expressions of the same object-model, we can make a tool to > generate those C-language codes based on relax-ng files. > Problems > ========= > > If we replace hardcoded code with generated code directly, there're still > several problems. > > Problem1 - name convension > Generated code follows specific name convension, now there're some differences > between generated code and current libvirt codes. > For example, generated struct's name will be like virXXXDef, parsefunc's name > will be like virXXXDefParseXML; the 'XXX' is a kind of Camel-Case name which > implies the struct's place in the hierarchy. > But in current code, there're many variations of name. > So if we replace current codes with generated codes directly, it requires > a lot of modifications in the caller codes. Regardless of whether or how we auto-generate code, cleaning up current naming conventions to be more consistent is a benefit. So as a general rule, instead of having a code generator which knows how to generate every wierd name, we should just fix the wierd names to follow a standard convention. > Problem2 - error-checking codes in the parsefunc > Most of current parse functions have error-checking codes after parsing XML > code. These code lines validate existence and parsing results. > Now RNG2C can't generate error-checking codes in the parsefunc. So we > should find a way to handle it. In the virDomainDef XML code, we've been slowly splitting out the code for doing error validation into separate methods. These two bugs illustrate the core ideas: https://gitlab.com/libvirt/libvirt/-/issues/7 https://gitlab.com/libvirt/libvirt/-/issues/8 the same concept would apply to other XML parsers such as the network parser, but it hasn't been a high priority for non-domain XML schemas. > Problem3 - hierarchy consistence > The RNG2C generate a 'struct'(C code) for each 'element'(relax-ng). Most > of current code follows this convention, but there're still some exceptions. > For example, the element 'bridge'(in network.rng) should be translated into a > struct called 'virNetworkBridgeDef'; but in fact, no this struct in the current > codes, and its members (stp/delay/...) are exposed to the struct 'virNetworkDef'. Yep, this is the really hard problem I think. The idea of generating the structs from the RNG has a fundamental assumption that the way we express & modularize the RNG matches the way we want to express and modularize the C structs. To "fix" this, either we have to change the C structs in to something that may be worse for the code, or we have to change the RNG schema into something that may be worse for the schema validation. There's a case in the domain XML which illustrates how painfull his can be: <define name="os"> <choice> <ref name="osxen"/> <ref name="oshvm"/> <ref name="osexe"/> </choice> </define> here, the osxen/oshvm share many attributes, and thus at the C level we want them all in the same struct, but at the RNG level we wanted them separate to more strictly validate the XML documents. ...cut all the technical details about the rng2c DIRECTIVE lang.... What I'm really interested in above all is what the end result looks like for both maintainers, and casual contributors. As a general goal we want to simplify libvirt to make it easier to contribute to. Eliminating the need to write custom XML parsing code certainly fits with that goal at a conceptual level. The code generator itself though needs some input so that it knows what to generate, and we have to be mindful of what this input data looks like and how easy it will be for maintainers/contributors to understand. In last 6 months, we've had a general aim to reduce the number of languages we expose contributors to. We've eliminated alot of Perl usage. We've started to replace HTML with RST, and also want to replace our XML/XSL based website generator with something simpler and non-XML based. There are some places where we can't avoid another language, as XDR for the RPC protocol. We'll never eliminate XML from libvirt entirely, since it is a fundamental part of our public API, but I am interested in how we can minimize the visibility of XML to our contributors. Being able to auto-generate XML parsers is thus interesting, but if the source input for the auto-generator is another XML document, that is kind of failing. This is where I'm finding the rng2c tool quite unappealing. The libvirt code is primarily C based, with a little bit of magic in places where we auto-generate code. For example the RPC code generator has magic comments in the XDR protocol definition, that is used to generate client and server dispatch code. I think of the XDR language as "psuedo-C" - it is conceptually close enough to C that C programmers can understand the XDR definitions fairly easily. The RNG schemas are a very different beast. Since they're XML based they are obviously completely unlike any of the C code, and I find that people have quite a strong dislike of use of XML in general. With this in mind I'm not enthusiastic about the idea of auto-generating the XML parsers from the RNG schemas, as it means everyone needs to know more about both the RNG schema language, and also learn this custom DIRECTIVE language used by the rng2c tool. If we consider this series, we've deleted 530 lines from network_conf.c and added 200 new lines for post-parse validation logic. This is is good overall improvement. No matter what approach or tool we use for XML parser auto-generation we'll get the similar result. Now consider the network_conf.h file, we have deleted typedef struct _virNetworkDNSTxtDef virNetworkDNSTxtDef; typedef virNetworkDNSTxtDef *virNetworkDNSTxtDefPtr; struct _virNetworkDNSTxtDef { char *name; char *value; }; typedef struct _virNetworkDNSSrvDef virNetworkDNSSrvDef; typedef virNetworkDNSSrvDef *virNetworkDNSSrvDefPtr; struct _virNetworkDNSSrvDef { char *domain; char *service; char *protocol; char *target; unsigned int port; unsigned int priority; unsigned int weight; }; typedef struct _virNetworkDNSHostDef virNetworkDNSHostDef; typedef virNetworkDNSHostDef *virNetworkDNSHostDefPtr; struct _virNetworkDNSHostDef { virSocketAddr ip; size_t nnames; char **names; }; typedef struct _virNetworkDNSForwarder virNetworkDNSForwarder; typedef virNetworkDNSForwarder *virNetworkDNSForwarderPtr; struct _virNetworkDNSForwarder { virSocketAddr addr; char *domain; }; typedef struct _virNetworkDNSDef virNetworkDNSDef; typedef virNetworkDNSDef *virNetworkDNSDefPtr; struct _virNetworkDNSDef { int enable; /* enum virTristateBool */ int forwardPlainNames; /* enum virTristateBool */ size_t ntxts; virNetworkDNSTxtDefPtr txts; size_t nhosts; virNetworkDNSHostDefPtr hosts; size_t nsrvs; virNetworkDNSSrvDefPtr srvs; size_t nfwds; virNetworkDNSForwarderPtr forwarders; }; and instead of this, we now have have these extra rules in the RNG schemas: <!-- VIRT:DIRECTIVE { "structure": {"output": "src/conf/network_conf"}, "clearfunc": {"output": "src/conf/network_conf"}, "parsefunc": { "output": "src/conf/network_conf", "post": true, "args.instname": true }, "formatfunc": {"output": "src/conf/network_conf"} } --> <!-- VIRT:DIRECTIVE { "name": "virNetworkDNSForwarder", "structure": {"output": "src/conf/network_conf"}, "clearfunc": {"output": "src/conf/network_conf"}, "parsefunc": { "output": "src/conf/network_conf", "post": true, "args.noctxt": true, "args.instname": true }, "formatfunc": { "output": "src/conf/network_conf", "order": ["domain", "addr"] } } --> <!-- VIRT:DIRECTIVE { "structure": {"output": "src/conf/network_conf"}, "clearfunc": {"output": "src/conf/network_conf"}, "parsefunc": { "output": "src/conf/network_conf", "post": true, "args.noctxt": true, "args.instname": true, "args": [ {"name": "partialOkay", "type": "Bool"} ] }, "formatfunc": {"output": "src/conf/network_conf"}, "members": [ {"id": "value", "opt": true} ] } --> <!-- VIRT:DIRECTIVE { "structure": {"output": "src/conf/network_conf"}, "clearfunc": {"output": "src/conf/network_conf"}, "parsefunc": { "output": "src/conf/network_conf", "post": true, "args.instname": true, "args": [ {"name": "partialOkay", "type": "Bool"} ] }, "formatfunc": {"output": "src/conf/network_conf"}, "members": [ {"id": "service", "opt": true}, {"id": "protocol", "opt": true} ] } --> <!-- VIRT:DIRECTIVE { "structure": {"output": "src/conf/network_conf"}, "clearfunc": {"output": "src/conf/network_conf"}, "parsefunc": { "output": "src/conf/network_conf", "args.instname": true, "post": true, "args": [ {"name": "partialOkay", "type": "Bool"} ] }, "formatfunc": {"output": "src/conf/network_conf"}, "members": [ {"id": "ip", "opt": true}, {"id": "hostname", "name": "name", "opt": true} ] } --> If I come to libvirt as a contributor with C language skils, but little experiance of RNG, I think this is a significant step backwards in the ability to understand libvirt code. It is way easier to understand what's going on from the C structs, than from the RNG schema and the VIRT:DIRECTIVE IMHO. Even as a maintainer, and having read the cover letter here, I find the VIR:DIRECTIVE metadata to be way too verbose and hard to understand compared to the structs. So I think that although the RNG based code generator elimintes alot of C code, it has the cost of forcing people to know more about the RNG code. Overall I don't think that's a clear win. This doesn't mean auto-generating code is a bad idea. I think it just means that the RNG schema is not the right place to drive auto-generation from. I'm wondering if you've ever done any programming with Golang and used its XML parsing capabilities ? Golang has a very clever approach to XML/JSON/YAML parsing which is based on directives recorded against the native Go structs. In the libvirt-go-xml.git repository, we've used this to map all the libvirt XML schemas into Go structs. IME, this has been the most pleasant way I've come across for parsing XML. If we consider just the DNS structs that you used to illustrate rng2c in this patch series. To add support for Go XML parsing and formatting, required the following comments against the Golang structs: type NetworkDNSTXT struct { XMLName xml.Name `xml:"txt"` Name string `xml:"name,attr"` Value string `xml:"value,attr"` } type NetworkDNSSRV struct { XMLName xml.Name `xml:"srv"` Service string `xml:"service,attr,omitempty"` Protocol string `xml:"protocol,attr,omitempty"` Target string `xml:"target,attr,omitempty"` Port uint `xml:"port,attr,omitempty"` Priority uint `xml:"priority,attr,omitempty"` Weight uint `xml:"weight,attr,omitempty"` Domain string `xml:"domain,attr,omitempty"` } type NetworkDNSHostHostname struct { Hostname string `xml:",chardata"` } type NetworkDNSHost struct { XMLName xml.Name `xml:"host"` IP string `xml:"ip,attr"` Hostnames []NetworkDNSHostHostname `xml:"hostname"` } type NetworkDNSForwarder struct { Domain string `xml:"domain,attr,omitempty"` Addr string `xml:"addr,attr,omitempty"` } type NetworkDNS struct { Enable string `xml:"enable,attr,omitempty"` ForwardPlainNames string `xml:"forwardPlainNames,attr,omitempty"` Forwarders []NetworkDNSForwarder `xml:"forwarder"` TXTs []NetworkDNSTXT `xml:"txt"` Host []NetworkDNSHost `xml:"host"` SRVs []NetworkDNSSRV `xml:"srv"` } The key important thing here is that the programmer is still fundamentally working with their normal Golang struct types / fields. All that is required to handle XML parsing & formatting is to add some magic comments which serve as directives to the XML parser, telling it attribute/element names, whether things are optional or not, etc. The attribute/element names are only needed if the struct field name is different from the XML name. My gut feeling is that if we want to go ahead with auto-generating C code for XML parsing/formatting, we should ignore the RNG schemas, and instead try to do something similar to the Golang approach to XML. The hard thing is that this would require us to write something which can parse the C header files. Generally in our XML parser header files we don't try to do anything too fancy - it is quite boring C code, so we would not have to parse the full C language. We can cope with a fairly simplified parser, that assumes the C header is following certain conventions. So we would have to add some magic comment directives to each struct we have typedef struct _virNetworkDNSTxtDef virNetworkDNSTxtDef; typedef virNetworkDNSTxtDef *virNetworkDNSTxtDefPtr; struct _virNetworkDNSTxtDef { char *name; /* xmlattr */ char *value; /* xmlattr */ }; typedef struct _virNetworkDNSSrvDef virNetworkDNSSrvDef; typedef virNetworkDNSSrvDef *virNetworkDNSSrvDefPtr; struct _virNetworkDNSSrvDef { char *domain; /* xmlattr,omitempty */ char *service; /* xmlattr,omitempty */ char *protocol; /* xmlattr,omitempty */ char *target; /* xmlattr,omitempty */ unsigned int port; /* xmlattr,omitempty */ unsigned int priority; /* xmlattr,omitempty */ unsigned int weight; /* xmlattr,omitempty */ }; typedef struct _virNetworkDNSHostDef virNetworkDNSHostDef; typedef virNetworkDNSHostDef *virNetworkDNSHostDefPtr; struct _virNetworkDNSHostDef { virSocketAddr ip; /* xmlcallback */ size_t nnames; char **names; /* xmlchardata:hostname,array */ }; typedef struct _virNetworkDNSForwarder virNetworkDNSForwarder; typedef virNetworkDNSForwarder *virNetworkDNSForwarderPtr; struct _virNetworkDNSForwarder { virSocketAddr addr; /* xmlcallback */ char *domain; /* xmlattr */ }; typedef struct _virNetworkDNSDef virNetworkDNSDef; typedef virNetworkDNSDef *virNetworkDNSDefPtr; struct _virNetworkDNSDef { int enable; /* xmlattr */ int forwardPlainNames; /* xmlattr */ size_t ntxts; virNetworkDNSTxtDefPtr txts; /* xmlelement:txt,array */ size_t nhosts; virNetworkDNSHostDefPtr hosts; /* xmlelement:host,array */ size_t nsrvs; virNetworkDNSSrvDefPtr srvs; /* xmlelement:srv,array */ size_t nfwds; virNetworkDNSForwarderPtr forwarders; /* xmlelement:forwarder,array */ }; Some explanation: - xmlattr Parse the field as an XML attribute with the same name as the struct field - xmlattr:thename Parse the field as an XML attribute called "thenanme" - xmlelement Parse the field as a child struct, populating from the XML element with same name. - xmlelement:thename Parse the field as a child struct, populating from the XML element called 'thename' - xmlcallback Call out to custom written XML handler methods to handle converting the data to/from the custom data type. eg for a field virSocketAddr, we'd call virSocketAddrXMLParse and virSocketAddrXMLFormat. - ,omitempty Don't format the attribute/element/chardata is the struct field is a NULL pointer, or an integer with value 0. BTW, when looking at libvirt-go-xml.git, you'll see there are still a bunch of places where we have to hand-write parsing/formatting code. There are essentially two reasons for this - Golang has no support for unions. As a result you have to fake unions by having a struct, with a bunch of optional fields each corresponding to a new struct. Rely on convention that only one field can be non-NULL. This requires custom parse functions since the Golang XML parser has no support for this code pattern. - The XML parser has no built-in directive to control whether it parses/formats in decimal vs hex. Both of these are things we won't suffer from if we did this in C, so potentially the XML parsing annotations needed for our C struct would result in something even simpler than the libvirt-go-xml.git code. The key question is just how difficult will it be to write a tool that can parse the C header files, and magic comments, to output suitable XML parser/formatter functions ? There's no easy way to answer that without someone trying it. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|