Re: [PATCH v10 2/4] domain: Add optional 'tls' attribute for TCP chardev

Pavel Hrdina <phrdina@xxxxxxxxxx> · Fri, 21 Oct 2016 08:29:05 +0200

On Thu, Oct 20, 2016 at 03:48:30PM -0400, John Ferlan wrote:
> [...]
> 
> >>>> +    <p>
> >>>> +      <span class="since">Since 2.4.0,</span> the optional attribute
> >>>> +      <code>tls</code> can be used to control whether a serial chardev
> >>>> +      TCP communication channel would utilize a hypervisor configured
> >>>> +      TLS X.509 certificate environment in order to encrypt the data
> >>>> +      channel. For the QEMU hypervisor, usage of a TLS envronment can
> >>>> +      be controlled on the host by the <code>chardev_tls</code> and
> >>>> +      <code>chardev_tls_x509_cert_dir</code> or
> >>>> +      <code>default_tls_x509_cert_dir</code> settings in the file
> >>>> +      /etc/libvirt/qemu.conf. If <code>chardev_tls</code> is enabled,
> >>>> +      then unless the <code>tls</code> attribute is set to "no", libvirt
> >>>> +      will use the host configured TLS environment.
> >>>> +      If <code>chardev_tls</code> is disabled, but the <code>tls</code>
> >>>> +      attribute is set to "yes", then libvirt will attempt to use the
> >>>> +      host TLS environment if either the <code>chardev_tls_x509_cert_dir</code>
> >>>> +      or <code>default_tls_x509_cert_dir</code> TLS directory structure exists.
> >>>> +    </p>
> >>>
> >>> Nice, this is a good description how to use the *tls* attribute.
> >>>
> >>
> >> BTW (regarding your followup reply):
> >>
> >> The 4 "consumers" of virDomainChrSourceDefParseXML (where this would be
> >> parsed) refer to this as a "serial chardev"
> > 
> > This is a generic function that parses source for a lot of different device
> > types/
> > 
> 
> Shortcutting in my mind to <source mode='{connect|bind} host='%s'
> service='%s'/> which is the VIR_DOMAIN_DEVICE_CHR, smartcard, rng, and
> redirdev. And yes, VIR_DOMAIN_DEVICE_CHR has paths for parallels,
> serials, consoles, and channels that are defined using <%s type='tcp'>.
> 
> >> The 'virDomainChrDefParseXML' comments have a list of "<serial ..." XML
> >> types. The location of the above description is describing a <serial
> >> type="tcp"> definition.
> > 
> > Well, the comment have that list but is used to parse all character devices,
> > not only serial char device.  TLS encryption can be used also for those types:
> > parallel, channel, and console.
> > 
> 
> My continual battle against under documentation.  The code is not self
> documenting in all cases...

I agree, we should be better in documenting things.

> >> The 'smartcard' discussion for a 'passthrough' device that would use
> >> this code says "Rather than having the hypervisor directly communicate
> >> with the host, it is possible to tunnel all requests through a secondary
> >> character device to a third-party provider (which may in turn be talking
> >> to a smartcard or using three certificate files). In this mode of
> >> operation, an additional attribute type is required, matching one of the
> >> supported serial device types, to describe the host side of the tunnel;..."
> > 
> > This comment is wrong, it should be "supported character device types".  This
> > attribute tells what interface is presented to the host.  Check this part of
> > documentation for all character devices:
> > 
> >   <http://libvirt.org/formatdomain.html#elementsConsole>
> > 
> > there is this sentence:
> > 
> >   "The interface presented to the host is given in the type attribute of the
> >   top-level element. The host interface is configured by the source element."
> > 
> > So it refers to all host interfaces:
> > 
> >   <http://libvirt.org/formatdomain.html#elementsCharHostInterface>
> > 
> >> The 'rng' discussion for backend that would use this code says "This
> >> backend connects to a source using the EGD protocol. The source is
> >> specified as a character device. Refer to character device host
> >> interface for more information. ..."
> > 
> > This is correct, there is no reference to serial character device.
> > 
> >> Redevdir says "An additional attribute type is required, matching one of
> >> the supported serial device types, to describe the host side of the
> >> tunnel; type='tcp' or type='spicevmc' ..."
> > 
> > This is the same case as smartcard device, this is wrong.
> > 
> >> So the long and short of it is, IMO it's a serial chardev device.
> >> Semantically it could be claimed otherwise, but the parsing proves
> >> otherwise as does the existing documentation of "Host interface"
> >> character devices.
> >>
> >> I prefer to keep it described as is. It's only ever used, parsed, etc.
> >> when <devices>... <serial type="tcp">... <source mode='connect'..."
> >>
> >> If anything, the description should become more restrictive to indicate
> >> that the option shouldn't be used for smartcards, rngs, and redirdevs,
> >> but I'll save that discussion for patch 3.
> > 
> > Based on the documentation it may appear that it should be a serial chardev,
> > but that's misleading and it should refer only to the "host interfaces of
> > character devices".
> > 
> 
> I cannot begin to describe how many times I've scrolled up and down
> through that discussion and thought how does anyone get this stuff
> correct... Trial and error I suppose.
> 
> In any case, it seems of the 3 the rng is the most correct and the other
> two should get patches in order to be more correct. Not sure I can do it
> justice. It would seem to me that smartcard and redirdev should use the
> pointer to the elementsCharHostInterface.

Yes and since now we know about this flaw in our documentation it should
be fixed.

> Still for the purposes of supported 'elementsCharHostInterface' when
> being used for specific smartcard, rng, and redirdev entries that are
> using "type='tcp'", only the <source mode='{connect|bind}' .../> would
> "appear to me" to apply as the "style" one would use in order to use
> TLS. That style just happens to have examples that list <serial
> type="tcp" <source mode=... />.
> 
> Hence why I see this as "a serial chardev TCP" or a "host interface
> serial chardev TCP". There's got to be some means to describe it that
> focuses the attention on the <source ...> and not the "<%s type="tcp">"
> that I focused on.

Yes, it's tricky and it took me a while to understand all of this, but I had to
combine the documentation with source code and that means we need to update that
documentation.

> [...]
> 
> >>>>  bool
> >>>> -qemuDomainSupportTLSChardevTCP(virQEMUDriverConfigPtr cfg)
> >>>> +qemuDomainSupportTLSChardevTCP(virQEMUDriverConfigPtr cfg,
> >>>> +                               const virDomainChrSourceDef *dev)
> >>>>  {
> >>>> -    if (cfg->chardevTLS)
> >>>> +    if (cfg->chardevTLS && dev->data.tcp.haveTLS != VIR_TRISTATE_BOOL_NO)
> >>>> +        return true;
> >>>> +    if (!cfg->chardevTLS && dev->data.tcp.haveTLS == VIR_TRISTATE_BOOL_YES &&
> >>>> +        virFileExists(cfg->chardevTLSx509certdir))
> >>>>          return true;
> >>>>      return false;
> >>>>  }
> >>>
> >>> So this function let's you decide whether we should try to set up *tls* for
> >>> chardev or not.  It work's but I have few issues with it.
> >>>
> >>> At first I don't like that libvirt would try to do something smart and don't
> >>> even tell user about the result.  This will silently ignore the *tls*
> >>> attribute if no certificate is found.  In case that *tls* attribute is set
> >>> to "yes" in XML and there is no certificate file to use we shouldn't start
> >>> that domain and print an error to user.
> >>
> >> This is a boolean function - so printing an error here isn't right.
> > 
> > Well, my comment implies to not use boolean function.
> > 
> 
> The callers were boolean checks, hence the generation of a boolean function.
> 
> >> Adding something to post parse processing is possible, but there's an
> >> impact based on whether we're setting a default value for haveTLS
> >> (something I disagree with doing).
> >>
> >> Beyond that would a check go in qemuDomainDeviceDefPostParse or
> >> qemuDomainDeviceDefValidate? It's never crystal clear to me which should
> >> be used when just reading the code. Although since I see this as a "new"
> >> and "optional" value, I'd lean toward PostParse. Then there's dealing
> >> with the parseFlags that it's impacted by other decisions.
> >>
> >> I could also rationalize that someone adding "tls='yes'" to their
> >> chardev would "know" what they're doing because they read the
> >> documentation. How else would they know to have this very specific
> >> combination (unless of course 'something' set things up that way based
> >> on the assumption of how a domain is currently running).
> >>
> >> Again, IMO 'haveTLS' is new and optional. The only indication that a
> >> domain is using TLS was handled via 'tlscreds'. IIRC, the "reason" that
> >> 'tlscreds' exists and how "tls-creds" gets added to the command line is
> >> because the JSON code processing for a chardev is buried in
> >> qemu_monitor_json.c and fishing for a host configuration option at that
> >> time wasn't viable.
> >>
> >>>
> >>> Secondly this way we don't reflect the current state for live domain in the
> >>> live XML.  This was probably lost during the discussion, but in general if
> >>> there is an attribute that can affect running domain we should reflect the
> >>> current state using that attribute.  I know, there are some cases where we
> >>> probably don't do that and they should be fixed.
> >>>
> >>
> >> No it wasn't lost - I considered it as I see you've seen from the cover.
> >> And you're probably right regarding attributes that trigger usage of
> >> some qemu option that we don't specifically save in the status XML.
> >> That's a different rat hole.
> >>
> >>> I figure out that we cannot simply use haveTLS = cfg->chardevTLS but we
> >>> can set the haveTLS based on cfg->chardevTLS.  The whole purpose of
> >>> qemuProcessPrepareDomain() is to prepare the domain definition so the
> >>> qemuBuildCommandLine() don't have to check other places to enable some
> >>> feature and not update the live definition.  If the *tls* attribute is
> >>> properly set in live definition that it will be saved to status XML and
> >>> there is no need to do anything for qemuProcessReconnect.
> >>
> >> I disagree on setting haveTLS during qemuProcessReconnect based upon
> >> chardevTLS. The 'haveTLS' is an optional attribute and by setting a
> >> value I believe we end up making an assumption.
> > 
> > I wrote that there is no need to do anything for qemuProcessReconnect.
> > 
> 
> I misinterpreted, but I also still have in my mind the previous
> discussion on this.
> 
> >> If *anything* was to be done it would be based solely on whether
> >> "tls-creds" is set on the command line of the reconnected domain.
> >> However, that too has a similar problem about setting a value for an
> >> optional attribute based upon the assumption that we know better.
> >>
> >> Again, the only indication that 'tls-creds' is on the command line was
> >> from the 'tlscreds' boolean that was set because the host configuration
> >> information was available. A domain that is running and has tls-creds
> >> will continue to have it. Altering that domain's configuration file
> >> because we add a new optional *configuration* value has no bearing on
> >> the *status* XML. When/if a configuration XML is updated, it's not
> >> checking that 'tlscreds' value to determine that at some point in
> >> history the domain used TLS because the host was configured that way.
> > 
> > The whole point of having 'tls' attribute in live XML is to ensure that when
> > libvirtd is restarted the attribute is still present in the XML because it will
> > be saved to status XML and loaded again from status XML.  There is no need to do
> > any magic by parsing qemu command line, we have statu XML for that purpose to
> > store all information about domain.
> > 
> 
> Let's see, you see the "tls={'yes'|'no'}" as essentially replacing the
> chardev_tls qemu.conf variable.

No, I don't see it that way, it's still an optional attribute that don't have
to be specified while defining domain, in other words it don't have to be
present in config XML.

> I don't see it that way. The whole purpose of an optional property is
> just that to be optional. I should be able to choose to add it or not.
> If it's there and I remove it, but then on every restart libvirt
> replaces it - that seems to go against the idea of being optional. If it
> never existed and then it shows up as soon as I start the domain; is
> that right?

This part is right, if the attribute is not configured in config XML it should
be updated when the domain is started to reflect the current state and it will
be present only in online XML.

> If I'm comparing pure migration xml, wouldn't the to/newer system now
> have the field defined (thus creating different XML). Does that migrated
> safely back to the previous version (2.3.0)? It would have a field that
> the old system wouldn't know what to do with.

The second patch solves this issue.

> >> Let's consider the bool function from above and this automagic setting
> >> being requested. Let's say we reconnect to a domain, find the
> >> 'tls-creds' set on the command line, and set 'haveTLS=YES' based solely
> >> on that. Let's say at some point in time after, someone edits their
> >> qemu.conf file and sets 'chardev_tls=0' (or comments out the
> >> 'chardev_tls=1'). In their mind, they've now disabled chardev TLS for
> >> their host and any domain they will run in the future. They stop the
> >> domain they knew was running using TLS before and restart it expecting
> >> that it won't use TLS anymore, but on restart they discover that in our
> >> infinite wisdom we have set the optional "tls='yes'" property for that
> >> chardev on that domain. Now if we 'error' out on that start like you
> >> request above, then that means they will have to edit their domain and
> >> remove the seemingly optional property.
> >>
> >> Next, let's assume they read the documentation and found that they can
> >> disable the qemu.conf value, but still have the domain chardev value if
> >> they set the "tls='yes'" property as long as they have their valid TLS
> >> directory configuration. In this case, they have made the conscious
> >> decision how they want their domain configured based on what they know
> >> is configured on the host.
> >>
> >> TBH: I take this single patch as a "feature request" add-on to the
> >> original feature request that I believe in the long run won't be used. I
> >> could be wrong, but it's a feeling.
> >>
> >> Furthermore, the purpose of any optional attribute is just that. It's
> >> optional based on some host wide setting. It's up to the consumer to
> >> decide how they should proceed, not the software to make that decision
> >> for them.
> > 
> > I guess that I'm unable to describe exactly what I mean so I'm attaching two
> > patches, one is for introducing TLS attribute and the second one is to make
> > sure that migration to libvirt-2.3.0 will work.
> > 
> 
> And I think the provided code proves that point. While "technically"
> still optional in the schema, the change in qemuProcessPrepareDomain
> forces it to be set as soon as a domain is started. Again, a bistate !=
> tristate. We don't track the difference between undefined or defined,
> but set to 0 other than in the return values from qemu.conf parsing. I
> already had to remove a boolean that tracked that from a prior review.

Now I see why we are not able to agree on this change.  The modification done
in qemuProcessPrepareDomain() are only to live definition, the config
definition remains untouched, which means the *tls* attribute (if it's based on
qemu.conf) will appear only in live XML.  The config XML will remain the same
after the guest is stopped.

> I think it should be strictly optional and that's where we differ. I see
> no reason to change the domain xml unless as a consumer that's what you
> want to do - be able to control which domains will have the setting.
> What else would be the purpose of a host wide setting to go with a
> domain optional setting?
> 
> Finally, if your idea is accepted, that means for any configuration with
> chardev_tls=0 (either because it's commented or set that way), every
> domain that starts will be updated to have this new attribute
> "tls='no'". Then one day, I read up on this wonderful new feature and

Not the domain, only the live XML which is not saved as config XML ...

> modify my qemu.conf file to set chardev_tls=1 and set up the TLS
> environment properly. I go to start my domain, but wait it's not using

And after you start the domain there will be "tls='yes'" because the config XML
doesn't contain any *tls* attribute.

I've tested all of those cases before proposing this patch:

prerequisite: prepare certificate files to be used for chardev devices

for running domain:
    live XML - virsh dumpxml $domain
    config XML - virsh dumpxml $domain --config
    migratable XML - virsh dumpxml $domain --migratable

1. set chardev_tls = 1
    a) start domain where there is no *tls* attribute in config XML
        - the domain is started and TLS is properly configured
        - in the live XML there is "tls='yes'"
        - in the config XML there is no *tls* attribute
        - in the migratable XML there is no *tls* attribure

    b) start domain where there is "tls='no'" in config XML
        - the domain is started and TLS is not configured
        - in the live XML there is "tls='no'"
        - in the config XML there is "tls='no'"
        - in the migratable XML there is "tls='no'"

    c) start domain where there is "tls='yes'" in config XML
        - the domain is started and TLS is properly configured
        - in the live XML there is "tls='yes'"
        - in the config XML there is "tls='yes'"
        - in the migratable XML there is "tls='yes'"

2. set chardev_tls = 0
    a) start domain where there is no *tls* attribute in config XML
        - the domain is started and TLS is not configured
        - in the live XML there is "tls='no'"
        - in the config XML there is no *tls* attribute
        - in the migratable XML there is no *tls* attribure

    b) start domain where there is "tls='no'" in config XML
        - the domain is started and TLS is not configured
        - in the live XML there is "tls='no'"
        - in the config XML there is "tls='no'"
        - in the migratable XML there is "tls='no'"

    c) start domain where there is "tls='yes'" in config XML
        - the domain is started and TLS is properly configured
        - in the live XML there is "tls='yes'"
        - in the config XML there is "tls='yes'"
        - in the migratable XML there is "tls='yes'"

Pavel

> it. Closer inspection finds, someone put "tls='no'" into my domain... To
> me that's not right.  And I won't necessarily know unless I know to look
> at the cmdline of the started domain to find that 'tls-creds' or I in
> some way "track" when TLS is being used.
> 
> 
> John
> 
> --
> libvir-list mailing list
> libvir-list@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/libvir-list
Attachment:
signature.asc

Description: Digital signature
--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list