Re: [Post Fact FBR] update repoSpanner's SSL cert for pagure01

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 16, 2019 at 09:41:03AM -0700, Kevin Fenzi wrote:
> On Wed, Oct 16, 2019 at 10:47:00AM +0200, Pierre-Yves Chibon wrote:
> > Good Morning Everyone,
> > 
> > This morning I found out that https://pagure.io/fedora-infrastructure was not
> > available, it was throwing a 500 error on every page/call.
> > 
> > I checked the logs and found:
> > GitError: Error performing curl request: (60): Peer certificate cannot be
> > authenticated with given CA certificates
> > 
> > The combination and "GitError" and a SSL related error led me to repoSpanner.
> > So with the help of Patrick, we confirmed that the SSL cert for pagure01 was
> > expiring on Oct 15th 2019.
> > We then regenerated that SSL cert.
> > 
> > We thought the repospanner playbook was going to redeploy that cert so I ran it,
> > but it did not change anything (both in its run as well as in the symptoms
> > observed).
> > 
> > We then found out that this piece is actually part of the pagure.yml playbook,
> > so I've ran it with `-t repospanner/server` to limit its effect.
> > Then I've restarted httpd, stunnel and repospanner@ansible.service on pagur01.
> > The first two were likely not necessary, the last one was to get the new cert in
> > use.
> > 
> > So I would like retro-active approval for my actions since the systems I've
> > touched are frozen.
> 
> So a few things: 
> 
> 1) +1 to the actions... thanks for fixing that!

Thanks for the +1!

> 2) we need nagios monitoring those certs, or we need to just tear
> down that cluster if we aren't going to use it (which we are currently
> not). 
> 
> 3) We could also 'unrepospanner' that repo since we aren't using it
> and put the old one back.

This may be wise, especially considering that I may not have fixed everything
(see the end of this email).

> 4) pagure perhaps should gracefully print 'sorry, the repo is not
> available right now due to a repospanner problem' but otherwise work?

+1 for this, I'm not sure of the size of the work in there but worth looking
into.


Also: Patrick said that the cert needs to be upgraded in other places (nodes) as
well, I do not know if running the repospanner playbook fixed it or not though,
so we may still have something broken.
I have received emails from pagure yesterday with:
"""
...
PagurePushDenied: Remote hook declined the push: Performing pre-check...
...
ERR Error syncing object out to enough nodes
"""

Which make me think we are still missing some fix, but I don't know which :(


Thanks,
Pierre

Attachment: signature.asc
Description: PGP signature

_______________________________________________
infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx

[Index of Archives]     [Fedora Development]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]

  Powered by Linux