On 07/25/2017 10:59 AM, Paul W. Frields wrote: > I'd meant to raise this question last week but it turned out several > folks were out of pocket who'd probably want to discuss. One of the > aspects of continuous integration[1] that impacts my team is the > storage requirement. How much storage is required for keeping test > results, composed trees and ostrees, and other artifacts? What is > their retention policy? > > A policy of "keep everything ever made, forever" clearly isn't > scalable. We don't do that in the non-CI realm either, e.g. with > scratch builds. I do think that we must retain everything we > officially ship, that's well understood. But atop that, anything we > keep costs storage, and over time this storage costs money. So we > need to draw some reasonable line that balances thrift and service. > > A. Retention > ============ > > The second question is probably a good one to start with, so we can > answer the first. So we need to answer the retention question for > some combination of: > > 1. candidate builds that fail a CI pipeline > 2. candidate builds that pass a CI pipeline > 3. CI composed testables > * a tree, ISO, AMI, other image, etc. that's a unit > * ostree change which is more like a delta (AIUI) > 4. CI generated logs > 5. ...other stuff I may be forgetting The other big bucket is packages in the buildroot used to build the builds. You may want to keep these as well if there is a desire to be able to rebuild packages at a later point. > > My general thoughts are that these things are kept forever: > > * (2), but only if that build is promoted as an update or as part of a > shipped tree/ostree/image > * (3), but only if the output is shipped to users > * (4), but only if corresponding to an item in (2) or (3) > > Outside that, artifacts and logs are kept only for a reasonable amount > of troubleshooting time. Say 30 days, but I'm not too worried about > the actual time period. It could be adjusted based on factors we have > yet to encounter. How does this proposal compare the existing practice in Fedora? > > B. Storage - How much? > ====================== > > To get an idea of what this might look like, I think we might make > estimates based on: > > * the number of builds currently happening per day > * how many of these builds are within the definition for an officially > shipped thing (like Atomic Host, Workstation, Server, etc.) > * The average size of the sources + binaries, summed out over the ways > we deliver them (SRPM + RPM, ostree binary, binary in another > image), and multiplied out by arches > * Then sum this out over the length of a Fedora release > > This is the part I think will need information from the rel-eng and CI > contributors, working together. My assumption is there are gaping > holes in this concept, so don't take this as a full-on proposal. > Rather, looking for folks to help harden the concepts and fill in the > missing pieces. I don't think we need a measurement down to the > single GB; a broad estimate in 100s of GB (or even at the 1 TB order > of magnitude) is likely good enough. > > I'm setting the follow-up to infrastructure@xxxxxxxxxxxxxxxxxxxxxxx, > since that team has the most information about our existing storage > and constraints. > > _______________________________________________ infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx