On Thu, Dec 17, 2020 at 10:20:38PM +0100, Pavel Raiskup wrote: > On Thursday, December 17, 2020 8:55:48 PM CET Kevin Fenzi wrote: > > On Thu, Dec 17, 2020 at 10:40:24AM +0100, Pavel Raiskup wrote: > > > On Sunday, December 13, 2020 11:02:53 PM CET Kevin Fenzi wrote: > > > > * Finally releng is looking at establishing a sidetag cleanup policy. > > > > A reminder that sidetags should be short lived and only created when > > > > needed. koji must generate buildroot repos for every single sidetag. > > > > ( You can list all your sidetags with 'fedpkg list-side-tags --mine' ) > > > > > > I'm just curious what's the most expensive thing for Koji to implement > > > side-tags, I'd expect something like: > > > > > > - the repositories are cloned (recursively hardlinked/symlinked?) > > > - the override package is added > > > - then createrepo_c is run > > > > I haven't closely looked, but I am pretty sure it's the createrepo_c > > calls. Not that they take that long, but that there has to be one for > > every single buildroot change. ie, if I build foo in rawhide, as soon as > > it lands in the f34 tag, the ~90 f34 side tags all have to run newrepo > > tasks. Repeat many many many times a day. > > We'd have to check the createrepo arguments, but according to that ^^^ > most probably there's some other useful optimization used in > Koji+createrepo. Otherwise Koji wouldn't be able to run createrepo > hundreds of times per day (subsequent, serial runs?) on such a large repo > like rawhide is. Well, it does use --skip-stat --update as you note below. > > > In Copr we don't have to clone the repositories, but we have to run the > > > createrepo_c after each build. In the past the bottleneck used to be > > > the createrepo_c run (recursive walk through all the packages in larger > > > repositories). > > > > > > If Koji suffers from the same problem, perhaps you could take a look at > > > the new createrepo_c option `--recycle-pkglist` - the createrepo_c costs > > > almost nothing then (matter of just reading and later writing the xml > > > metadata). > > > > That would work for the simple case of an updated package, but it > > wouldn't work for new packages added would it? I suppose we could tell > > people if they need to use a new package in their sidetag to delete and > > recreate it? Not sure how much hassle that would be. > > It shouldn't matter (i think) if the package is updated, or newly added. > Createrepo needs still to be explicitly told what metadata for which concrete > RPMs must be added (--pkglist), and what removed (--exclude). > > Anyway, based on the numbers above it looks like e.g. `--skip-stat --update` is > used in Koji, and the --pkglist is carefully maintained by Koji. I guess that > --recycle-pkglist wouldn't help here. Yeah, koji keeps track of that. > Side note, before I was really curious why parsing and then writing of > medium sized (~20MB) repodata [1] takes about ~20s [2] with '--workers 8' > [3] (mostly no I/O). But I forgot that the repo is compressed, so it's > about 155MB of XML metadata. Considering that only decompression and > compression takes together ~3 seconds on my laptop, and that we still > generate the sqlite database files in Copr (I don't know the overhead of > this), it's likely there's not much space for performance optimizations. > Certainly not some low hanging fruit.. I wonder if '--compression-type gz' would save some... would save making some bz2 files and for koji, size isn't a big deal since builders are close to the hub/packages for the most part. kevin
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx