On Wed, Oct 16, 2019 at 09:41:03AM -0700, Kevin Fenzi wrote: > On Wed, Oct 16, 2019 at 10:47:00AM +0200, Pierre-Yves Chibon wrote: > > Good Morning Everyone, > > > > This morning I found out that https://pagure.io/fedora-infrastructure was not > > available, it was throwing a 500 error on every page/call. > > > > I checked the logs and found: > > GitError: Error performing curl request: (60): Peer certificate cannot be > > authenticated with given CA certificates > > > > The combination and "GitError" and a SSL related error led me to repoSpanner. > > So with the help of Patrick, we confirmed that the SSL cert for pagure01 was > > expiring on Oct 15th 2019. > > We then regenerated that SSL cert. > > > > We thought the repospanner playbook was going to redeploy that cert so I ran it, > > but it did not change anything (both in its run as well as in the symptoms > > observed). > > > > We then found out that this piece is actually part of the pagure.yml playbook, > > so I've ran it with `-t repospanner/server` to limit its effect. > > Then I've restarted httpd, stunnel and repospanner@ansible.service on pagur01. > > The first two were likely not necessary, the last one was to get the new cert in > > use. > > > > So I would like retro-active approval for my actions since the systems I've > > touched are frozen. > > So a few things: > > 1) +1 to the actions... thanks for fixing that! Thanks for the +1! > 2) we need nagios monitoring those certs, or we need to just tear > down that cluster if we aren't going to use it (which we are currently > not). > > 3) We could also 'unrepospanner' that repo since we aren't using it > and put the old one back. This may be wise, especially considering that I may not have fixed everything (see the end of this email). > 4) pagure perhaps should gracefully print 'sorry, the repo is not > available right now due to a repospanner problem' but otherwise work? +1 for this, I'm not sure of the size of the work in there but worth looking into. Also: Patrick said that the cert needs to be upgraded in other places (nodes) as well, I do not know if running the repospanner playbook fixed it or not though, so we may still have something broken. I have received emails from pagure yesterday with: """ ... PagurePushDenied: Remote hook declined the push: Performing pre-check... ... ERR Error syncing object out to enough nodes """ Which make me think we are still missing some fix, but I don't know which :( Thanks, Pierre
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx