----- Original Message ----- > From: "Kashyap Chamarthy" <kchamart@xxxxxxxxxx> > To: "Kevin Fenzi" <kevin@xxxxxxxxx> > Cc: infrastructure@xxxxxxxxxxxxxxxxxxxxxxx, mattdm@xxxxxxxxxxxxxxxxx, afazekas@xxxxxxxxxx > Sent: Friday, June 20, 2014 5:55:17 AM > Subject: Re: Reliability of Fedora infrastructure to download cloud images > > On Thu, Jun 19, 2014 at 09:20:14AM -0600, Kevin Fenzi wrote: > > On Thu, 19 Jun 2014 00:24:55 +0530 > > Kashyap Chamarthy <kchamart@xxxxxxxxxx> wrote: > > > > > [I'm not subscribed to this list, please keep me in CC.] > > > > > > Heya, > > > > > > A little while ago, we (Matthew Miller, myself, Attila Fazekas > > > (upstream OpenStack developer) had an IRC discussion (on > > > #openstack-qa, Freenode) with OpenStack upstream CI infrastructure > > > folks about their concerns for continuing to have Fedora as a default > > > to run as CI voting guest (Nova instance). They (mostly Sean Dague - > > > a major upstream OpenStack contributor who voiced these) outlined a > > > few issues: > > > > I'm not famillar with the terminology, what does a 'voting guest' mean? > > Sorry for being unclear. It means, any proposed OpenStack change/patch > has to be executed on a Fedora virtual machine too, only once it passes > the tests on Fedora, patches will be merged to upstream git. I cc'd > Attila, he can correct me if I said something wrong. > If the job is voting on the gate pipeline it can prevent incompatible changes. > > > > > 1. It's not possible to download from the fedora infrastructure > > > reliably - 10% failure rate from their cloud providers (HP and > > > RAX). > > > - About this point, when mattdm inquired - "is the failure in > > > hitting the fedora mirrors or fedora core infrastructure?", > > > their response - "I don't fully know, I think going through > > > the url we are using we get bounced to mirrors". > > > > Yeah, more data would be very nice here... what url(s) they are using, > > what error codes if any they get back? I saw the image download failure at least once, but I cannot find the pattern for the failure :(. IMHO it was less than 10% failure rate, but open-stack infra/QA notices issues above 0.1% failure rate. If I or anyone see the failure pattern again he can add a query to the http://status.openstack.org/elastic-recheck/. In this case we would know how much issues happens exactly. Anyone who sign the Openstack contributor agreement, can propose queries to the repo: https://github.com/openstack-infra/elastic-recheck/tree/master/queries Here are the image download urls: https://github.com/openstack-dev/devstack/blob/master/stackrc#L357 > > Looking at the script[1] that creates the CI VM, it uses this URL -- > https://dl.fedoraproject.org/pub/fedora/linux/releases/20/Images/x86_64/Fedora-x86_64-20-20131211.1-sda.qcow2 > > > [1] https://github.com/openstack-dev/devstack/blob/master/stackrc#L353 > > > > Are these the released cloud images? f19/20? Or nightlies or ? > > Released, official images. > > > How often do they download? Once a image is loaded, I am not sure why > > they would re-download it unless it's changed? > > I just confirmed, they (CI infra) download and cache it. But, once every > 24 hours, they rebuild the caches. It's the humans that download it > manually (without any caching environment) that face the bottlenecks > they say. > AFAIK every worker node downloads the L2 images once it's lifetime, I do not know what is the average lifetime of these vms. An L2 image version switch can lead to ~500 image download in 1 hour. > > Or unless they are > > grabbing nightly rawhide images? > > They won't prefer to do this as only distribution tested image will be > used used in OpenStack CI environment. > > > > 2. There are possibly issues with the normal upstream fedora image > > > that could be fixed with custom respin. > > > - NOTE: I'm doubtful of this idea, as existing Fedora cloud > > > images itself are not really extensively tested. I'd think focusing on > > > _official_ cloud images and having a solid set of tests so > > > that it can be consumed by cloud projects (OpenStack, etc). > > > > > > - Having a custom respin means that we're off the main path for > > > testing of the image -- which again needs _some_ level of > > > assurance that it can be used in a higher-level cloud > > > project's CI infr- which again needs _some_ level of assurance that > > > it can be used in a higher-level cloud project's CI infra. > > > > Yeah, I would think we would like to avoid that... and try and merge in > > the changes they need for images instead of them going and making their > > own that only they use. > > Oh, it's my poor wording, they didn't mean to say _they'd_ create these > custom images. OpenStack infra is clear - they'd only use reasonably > well-tested imges from Distributions. > > > > 3. Another important point OpenStack infra folks emphasized is - > > > these images will get 4000 test runs a week on them > > > > Cool. > > > > > Any suggestions to allay these are welcome. > > > > Happy to try and solve any bottlenecks they are having... > > Yeah, folks are testing more than ever with Fedora lately. > > OpenStack infra/qa folks have an upcoming meet up discuss several, > Fedora is also on their topic. Will let you know if they provide more > specific, technical feedback from OpenStack infra. > > > Thanks. > > > -- > /kashyap > _______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure