Re: ostree/fedora atomic and impact on the mirror network

Colin Walters <walters@xxxxxxxxxx> · Mon, 10 Mar 2014 14:11:34 +0000

On Mon, Mar 10, 2014 at 9:19 AM, Matthew Miller <mattdm@xxxxxxxxxxxxxxxxx> wrote:

So, I've been thinking about Colin Walters' ostree project (Fedora Atomic
Initiative) -- see <http://sched.co/1eVhZ05>. One of the concerns I have is 
with requirements on the mirror network. Right now, the impact on mirrors of 
an update is a few metadata requests plus one per package. It seems like
ostree could be significantly worse, with requests _per file_.

Yep, it's definitely true that OSTree's HTTP replication can be worse than yum/rpm/deltarpm in many scenarios.  (There are scenarios where current OSTree is better too).  Static deltas are the ultimate solution here, and initial code already exists (see https://bugzilla.gnome.org/show_bug.cgi?id=721799 )

Now, a few things.  First, the current goal of Fedora Atomic Initiative is just to track Rawhide - I was talking with Dennis Gilmore at devconf.cz and we felt this made the most sense rather than trying to jump all the way to releases.  So the idea here is that it's for users who are already updating weekly or faster.

Tracking rawhide plays into the other strengths of OSTree, such as the fact that after you upgrade, you still have the previous tree around to fall back on if things are broken.

Now, let's talk about space usage on the mirror network.  A *very* interesting question is how much tree history we keep.  A lot of this is a function of how many trees we generate (at the moment, I just made up some "baseline" products) as well as how often the packages in those trees change.

One model I'd like to aim for here is we say "the repository will take up at most N  GB" (where e.g. N=100) and we keep an intelligently-scheduled series of snapshots, like backup systems do.   We don't need to keep every change to every RPM, just interesting ones - keep only a few old snapshots from last year, plus a few from each month this yer, plus many from this week.   OSTree has some very simple support for pruning already (ostree prune --refs-only --depth=100) - max size model would be harder but is doable.

Another thing I've been thinking about is that there should likely be separate "development" and "release" repositories.

And the "release" repository would be synced out to more mirrors.  This repo might contain just each "gold" release, plus the intermediate alpha/beta snapshots.  Plus say monthly update snapshots.

In this model, the release repository would be a separate composition from the development repo - it would reprocess the same RPM versions, and would require re-GPG-signing, etc.

So an offhand TODO list for production releases:

 - Anaconda support (working on it)
 - Move rpm-ostree into Koji
   - Requires RHEL7 or newer build host
   - Write Koji plugin
   - GPG signing (or TLS for metadata)
- Static deltas (initial code exists, needs HTTP/GPG plus optimization)
- Determine mirror impact
  - Space availability
  - Determine whether some mirrors would want to opt out of higher HTTP load

-- 
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct