Re: Storage requirements for CI

Dennis Gregorovic <dgregor@xxxxxxxxxx> · Tue, 25 Jul 2017 12:12:30 -0400

On 07/25/2017 10:59 AM, Paul W. Frields wrote:
> I'd meant to raise this question last week but it turned out several
> folks were out of pocket who'd probably want to discuss.  One of the
> aspects of continuous integration[1] that impacts my team is the
> storage requirement.  How much storage is required for keeping test
> results, composed trees and ostrees, and other artifacts?  What is
> their retention policy?
> 
> A policy of "keep everything ever made, forever" clearly isn't
> scalable.  We don't do that in the non-CI realm either, e.g. with
> scratch builds.  I do think that we must retain everything we
> officially ship, that's well understood.  But atop that, anything we
> keep costs storage, and over time this storage costs money.  So we
> need to draw some reasonable line that balances thrift and service.
> 
> A. Retention
> ============
> 
> The second question is probably a good one to start with, so we can
> answer the first.  So we need to answer the retention question for
> some combination of:
> 
> 1. candidate builds that fail a CI pipeline
> 2. candidate builds that pass a CI pipeline
> 3. CI composed testables
>   * a tree, ISO, AMI, other image, etc. that's a unit
>   * ostree change which is more like a delta (AIUI)
> 4. CI generated logs
> 5. ...other stuff I may be forgetting
The other big bucket is packages in the buildroot used to build the builds.  You may want to keep these as well if there is a desire to be able to rebuild packages at a later point.  

> 
> My general thoughts are that these things are kept forever:
> 
> * (2), but only if that build is promoted as an update or as part of a
>   shipped tree/ostree/image
> * (3), but only if the output is shipped to users
> * (4), but only if corresponding to an item in (2) or (3)
> 
> Outside that, artifacts and logs are kept only for a reasonable amount
> of troubleshooting time.  Say 30 days, but I'm not too worried about
> the actual time period.  It could be adjusted based on factors we have
> yet to encounter.

How does this proposal compare the existing practice in Fedora?

> 
> B. Storage - How much?
> ======================
> 
> To get an idea of what this might look like, I think we might make
> estimates based on:
> 
> * the number of builds currently happening per day
> * how many of these builds are within the definition for an officially
>   shipped thing (like Atomic Host, Workstation, Server, etc.)
> * The average size of the sources + binaries, summed out over the ways
>   we deliver them (SRPM + RPM, ostree binary, binary in another
>   image), and multiplied out by arches
> * Then sum this out over the length of a Fedora release
> 
> This is the part I think will need information from the rel-eng and CI
> contributors, working together.  My assumption is there are gaping
> holes in this concept, so don't take this as a full-on proposal.
> Rather, looking for folks to help harden the concepts and fill in the
> missing pieces.  I don't think we need a measurement down to the
> single GB; a broad estimate in 100s of GB (or even at the 1 TB order
> of magnitude) is likely good enough.
> 
> I'm setting the follow-up to infrastructure@xxxxxxxxxxxxxxxxxxxxxxx,
> since that team has the most information about our existing storage
> and constraints.
> 
> 
_______________________________________________
infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx