This is a second freeze break request for Fedora Tagger. I haven't actually carried out the first one yet, but Kevin's mention of the broken bodhi sqlitebuildtags url took us down a rabbit hole. This is the traceback that tagger logs when the bodhi masher tries to pull down its sqlitebuildtags: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/flask/app.py", line 1687, in wsgi_app response = self.full_dispatch_request() File "/usr/lib/python2.6/site-packages/flask/app.py", line 1360, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/lib/python2.6/site-packages/flask/app.py", line 1358, in full_dispatch_request rv = self.dispatch_request() File "/usr/lib/python2.6/site-packages/flask/app.py", line 1344, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/usr/lib/python2.6/site-packages/fedoratagger/api/api.py", line 461, in tag_pkg_sqlite return fedoratagger.lib.sqlitebuildtags() File "/usr/lib/python2.6/site-packages/fedoratagger/lib/sqlite_export.py", line 56, in sqlitebuildtags conn.executemany(insert_statement, rows) IntegrityError: columns name, tag are not unique The problem is that the schema upgrade from tagger1 to tagger2 didn't go smoothly in production, and we didn't notice. There is supposed to be a DB constraint that disallows duplicate tags on packages, but it is absent. We can't just apply that new constraint, because there are now two duplicate tags in the DB. We need to: 1) Remove the duplicates 2) Apply the constraint so it doesn't happen again Here's the constraint we need to apply in postgres: ALTER TABLE tag ADD CONSTRAINT unique_package_label UNIQUE (package_id, label); Here's a script to remove the duplicates: #!/usr/bin/env python import os os.environ['FEDORATAGGER_CONFIG'] = '/etc/fedoratagger/fedoratagger.cfg' from sqlalchemy import func, and_ import fedoratagger import fedoratagger.lib.model as m print 'Looking now for package tag dupes.' query = fedoratagger.SESSION.query(m.Tag).all() results = {} for tag in query: results[tag.label] = results.get(tag.label, {}) results[tag.label][tag.package_id] = \ results[tag.label].get(tag.package_id, 0) + 1 dupes = [] for label, packages in results.items(): for package, count in packages.items(): if count > 1: dupes.append((label, package)) print "Found these package+tag dupes:", dupes base_query = fedoratagger.SESSION.query(m.Tag) for label, package_id in dupes: query = base_query.filter(and_( m.Tag.label==label, m.Tag.package_id==package_id )).all() keep, rest = query[0], query[1:] for dupe in rest: fedoratagger.SESSION.delete(dupe) fedoratagger.SESSION.commit()
Attachment:
pgpK4QJpCby4R.pgp
Description: PGP signature
_______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure