Freeze Break Request: Fedora Tagger Schema

Ralph Bean <rbean@xxxxxxxxxx> · Thu, 16 May 2013 11:27:44 -0400

This is a second freeze break request for Fedora Tagger.  I haven't
actually carried out the first one yet, but Kevin's mention of the
broken bodhi sqlitebuildtags url took us down a rabbit hole.

This is the traceback that tagger logs when the bodhi masher tries to
pull down its sqlitebuildtags:

   Traceback (most recent call last):
     File "/usr/lib/python2.6/site-packages/flask/app.py", line 1687, in wsgi_app
       response = self.full_dispatch_request()
     File "/usr/lib/python2.6/site-packages/flask/app.py", line 1360, in full_dispatch_request
       rv = self.handle_user_exception(e)
     File "/usr/lib/python2.6/site-packages/flask/app.py", line 1358, in full_dispatch_request
       rv = self.dispatch_request()
     File "/usr/lib/python2.6/site-packages/flask/app.py", line 1344, in dispatch_request
       return self.view_functions[rule.endpoint](**req.view_args)
     File "/usr/lib/python2.6/site-packages/fedoratagger/api/api.py", line 461, in tag_pkg_sqlite
       return fedoratagger.lib.sqlitebuildtags()
     File "/usr/lib/python2.6/site-packages/fedoratagger/lib/sqlite_export.py", line 56, in sqlitebuildtags
       conn.executemany(insert_statement, rows)
   IntegrityError: columns name, tag are not unique

The problem is that the schema upgrade from tagger1 to tagger2 didn't
go smoothly in production, and we didn't notice.  There is supposed to
be a DB constraint that disallows duplicate tags on packages, but it is
absent.

We can't just apply that new constraint, because there are now two
duplicate tags in the DB.

We need to:

  1) Remove the duplicates
  2) Apply the constraint so it doesn't happen again

Here's the constraint we need to apply in postgres:

    ALTER TABLE tag ADD CONSTRAINT unique_package_label UNIQUE (package_id, label);

Here's a script to remove the duplicates:

    #!/usr/bin/env python

    import os
    os.environ['FEDORATAGGER_CONFIG'] = '/etc/fedoratagger/fedoratagger.cfg'

    from sqlalchemy import func, and_
    import fedoratagger
    import fedoratagger.lib.model as m

    print 'Looking now for package tag dupes.'
    query = fedoratagger.SESSION.query(m.Tag).all()
    results = {}

    for tag in query:
        results[tag.label] = results.get(tag.label, {})
        results[tag.label][tag.package_id] = \
            results[tag.label].get(tag.package_id, 0) + 1 

    dupes = []
    for label, packages in results.items():
        for package, count in packages.items():
            if count > 1:
                dupes.append((label, package))

    print "Found these package+tag dupes:", dupes

    base_query = fedoratagger.SESSION.query(m.Tag)
    for label, package_id in dupes:
        query = base_query.filter(and_(
            m.Tag.label==label,
            m.Tag.package_id==package_id
        )).all()

        keep, rest = query[0], query[1:]
        for dupe in rest:
            fedoratagger.SESSION.delete(dupe)

    fedoratagger.SESSION.commit()
Attachment:
pgpK4QJpCby4R.pgp

Description: PGP signature
_______________________________________________
infrastructure mailing list
infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/infrastructure