RFC: virInterface change transaction API

Laine Stump <laine@xxxxxxxxx> · Fri, 08 Apr 2011 15:31:05 -0400

I've been asked to implement what some people have termed as a
"transaction-oriented" API for host interface configuration (ie 
virInterface*()).
The basic intent is to allow rollback to a known-good config if anything 
goes
wrong when changing around the host network config with virInterface*()
functions.

The most straightforward way to achieve this is that prior to calling
virInterfaceDefine/virInterfaceUndefine, the current state of the
host's network configuration (ie the /etc/sysconfig/network-scripts/ifcfg-*
files in the case of Fedora and RHEL) would be saved off somewhere, and
kept around until we're sure the new config is good; once we know that,
we can just eliminate the backup. If, however, the user of virInterface*()
explicitly requests, we could copy the files back; alternately if the system
is rebooted without these known-good files being erased, we would assume
that something went wrong and restore the original config.

As with all other virInterface functions, the details of all this will
be handled by netcf (and below), but since libvirt is the main consumer
of netcf, I figure this is the appropriate place to discuss how it gets 
done,
so please let me know any opinions on any piece of this. I plan to start
the implementation "soon", as I want to be finished before the end of
May.

I see 3 layers to this:

1) libvirt

   At the libvirt layer, this feature just requires 3 new APIs, which
   are directly passed through to netcf:

       virInterfaceChangeStart(virConnectPtr conn, unsigned int flags);
       virInterfaceChangeCommit(virConnectPtr conn, unsigned int flags);
       virInterfaceChangeRollback(virConnectPtr conn, unsigned int flags);

   For the initial implementation, these will be simple passthroughs
   to similarly named netcf functions. (in the future, it would be
   useful for the server side of libvirt to determine if client<->server
   connectivity was lost due to the network changes, and automatically
   tell netcf to do a rollback).

2) netcf

   The netcf api will have these same three APIs, just named slightly
   differently:

        ncf_change_start(struct netcf *ncf, unsigned int flags);

           There are two possibilities for this. Either:

            A) call the initscript described below to save all config
               files that might possibly be changed (snapshot_config)

              or

            B) set a flag in *ncf indicating that all future calls
               to netcf that would end up modifying a particular
               config file should save off that file *if it hasn't
               already been saved*.

            (A) is simpler, but relies on the initscript having
            exact/complete matching knowledge of what files netcf may
            change. Should we worry about that and deal with the
            complexities of (B), or is (A) good enough for now?

        ncf_change_rollback(struct netcf *ncf, unsigned int flags);

           Again, two possbilities:

           A)
              a) save the config of all current interfaces (in memory)
              b) call the initscript below to restore the config to its
                 original state.
              c) compare the new config to the old, and:
                 * bring down any interfaces that no longer exist
                   (PROBLEM: once an interface has no config files, you can
                    no longer operate on it with "ifdown")
                 * bounce any interfaces that have changed
                 * bring up any interfaces that have been re-added
            or

           B)
               a) ifdown all interfaces
               b) call initscript to restore previous config 
(rollback_config)
               c) ifup all interfaces.

           (A) is much simpler, but may lead to unnecessary
           difficulties when we bounce interfaces that didn't really
           need it. So, the same question oas for ncf_change_start() -
           is the more exact operation worth the extra complexity?

        ncf_change_commit(struct netcf *ncf, unsigned int flags);

            The simplest function - this will just call the initscript
            to erase the backup (commit_config).

3) initscript

   This initscript will at first live in (be installed by) netcf
   (called /etc/init.d/networking-config?), but hopefully it will
   eventually be accepted by the initscripts package (which includes
   the networking-related initscripts), as it is of general use. (Dan
   Kenigsberg already already took a stab at this script last year,
   but received no reply from the initscripts maintainers, implying
   they may not be too keen on the idea right now - it might take some
   convincing ;-)

https://fedorahosted.org/pipermail/initscripts-devel/2010-February/000025.html

   It will have three commands, one of which will be called
   automatically by "start" (the command called automatically at boot
   time):

   snapshot_config

     This will save a copy of (what the script believes are - is this
     problematic?) all network-config related files. It may or may not
     be called by netcf (see the notes in ncf_start_change() above.

     If this function finds that a snapshot has already been taken,
     it should fail.

   rollback_config (automatically called from "start" at boottime)

     This will move back (from the saved copies) all files that were
     changed/removed since snapshot, *and delete any files that have
     been added*.

     Note that this command doesn't need to worry about ifup/ifdown,
     because it will be called prior to any other networking startup
     (part of the reason that netcf will need to deal with that).

     I notice that Dan K's version saves the modified files to a
     "rollback-${date}" directory. Does this seem like a good idea?
     It's nice to not lose anything, but there is no provision for
     eliminating old versions, so it could grow without bound.

   commit_config

     This will just remove all the files in the save directory.

So, the two problems I have right now:

1) Do we accept the inexact method of just saving all files that match
   a list of patterns during *start(), then in *rollback() erasing all
   files matching that pattern and copying the old file back? Or do we
   need to keep track of what files have been changed/removed and added,
   and copy back / delete only those files during rollback?

   (A version control system would keep track of this rather nicely,
   but that's too complex for something that's intended to be a
   failsafe (and that we would also like to eventually be in the base
   OS install). Dan B. at one point suggested using patchfiles if I
   wanted the save info to keep exact track of which files would need
   to be replaced/deleted on rollback, but on further thought this
   turns out to not be workable, since we would need to run diff (to
   create the patchfile) after all changes had been made, and any
   outside changes to any of the files would leave the patchfile
   un-appliable, thus causing our "failsafe" to fail :-( ). Therefore,
   we will need to rely on the list of globs to tell us what files
   need to be deleted, or keep our own list in a separate file.)

2) Is it going to be okay to ifdown all interfaces prior to the
   rollback, and ifup all interfaces afterwards? Or must we compare
   the new config to the original, and ifdown only those interfaces
   that had been previously added/changed, then ifup only those
   interfaces that had been previously removed/changed?

3) If anyone has ideas on making the initscript more palatable to the
   initscripts people, please speak up! :-) (one comment from an 
initscripts
   person was that 1) for the general case it would be difficult to 
draw the
   line on what parts of network connectivity should be included in this
   rollback functionality, and 2) at some point this becomes a general
   system config problem, and would really be better addressed by a
   general system wide config management system. These are both
   concerns that need well qualified answers. (I tend to think that this
   is intended as a failsafe to prevent unreachable systems, so it should
   be as simple as possible, and thus shouldn't be burdened with the
   complexity of a full system config management system (which could
   also co-exist at a higher level), but better answers are welcome.)

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list