Greetings everyone. So, we have seen some very sporadic connection issues that have been breaking composes. I'm at a loss how to debug it or figure out whats going on, so I thought I would collect information here and see if anyone had any ideas. Of the last 11 doomed rawhide composes: Fedora-Rawhide-20190622.n.0 - loop issue [0] Fedora-Rawhide-20190622.n.1 - loop issue [0] Fedora-Rawhide-20190623.n.0 - armv7 http2 framing error [1] Fedora-Rawhide-20190624.n.0 - loop issue [0] Fedora-Rawhide-20190626.n.0 - kde broken deps Fedora-Rawhide-20190626.n.1 - loop issue [0] Fedora-Rawhide-20190627.n.0 - loop issue [0] Fedora-Rawhide-20190627.n.1 - pkg download issue [2] Fedora-Rawhide-20190630.n.0 - armv7 http2 framing error[1] Fedora-Rawhide-20190630.n.1 - a x86_64 live downloading issue [3] Fedora-Rawhide-20190702.n.0 - pkg download issue [2] [0] mock in old chroot has 5 loop devices available, if you try and use more than that they just don't appear in the chroot. For some reason loop devices aren't getting cleaned up as easily or we hit multiple composes per builder and it goes over 5. I patched our mock to have 11 of them. :) See: [1] This looks like: DEBUG util.py:585: BUILDSTDERR: [MIRROR] libavc1394-0.5.4-10.fc30.armv7hl.rpm: Curl error (16): Error in the HTTP2 framing layer for https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20190623.n.0/compose/Everything/armhfp/os/Packages/l/libavc1394-0.5.4-10.fc30.armv7hl.rpm [] DEBUG util.py:585: BUILDSTDERR: [FAILED] libavc1394-0.5.4-10.fc30.armv7hl.rpm: No more mirrors to try - All mirrors were already tried without success DEBUG util.py:585: BUILDSTDERR: Unable to create appliance : Unable to download from repo : Cannot download Packages/l/libavc1394-0.5.4-10.fc30.armv7hl.rpm: All mirrors were tried [2] This looks like: (it's a screenshot, see: https://kojipkgs.fedoraproject.org//work/tasks/3061/35873061/screenshot.ppm ) but basically: Failed to download the following packages: Cannot download Packages/s/systemd-bootchart-233-4.fc30.x86_64.rpm: All mirrors were tried. [3] This looks like: DEBUG util.py:585: BUILDSTDERR: 2019-07-01 03:02:48,081: Non interactive installation failed: Failed to download the following packages: Cannot download Packages/i/iwl3945-firmware-15.32.2.9-97.fc31.noarch.rpm: All mirrors were tried. A short summary of our setup: A builder, which might be running a mock chroot or a vm requests something from kojipkgs. kojipkgs resolves for them in dns to 2 ip's: proxy101 or proxy110. Those run apache which proxies incoming requests for that host to haproxy also running on those hosts. haproxy checks for liveness of the two backend kojipkgs servers: kojipkgs01 and kojipkgs02. The request will go to one of the two or whichever is up. kojipkgs01/02 have varnish listening on port 80, so if they have the thing in cache is just replies with that. Otherwise it makes a query against a locally running apache to fetch the item. That apache reads the package from a nfs mount of all the koji data those machines have on them. So, lots of layers, but erverything should be pretty reliable. And most of the time it is. The builders download tons and tons of things just fine via this setup without problems usually. At the point all these are failing, it would be running the rawhide curl/dnf stack, not the native f29 one on the builders, so if it's a curl problem I would expect it to happen more often. So, how can we debug or mitigate this? any ideas? kevin
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx