Re: Big Freeze Break Request: replicate db-koji01

Stephen John Smoogen <smooge@xxxxxxxxx> · Tue, 22 Oct 2019 16:56:45 -0400

On Tue, 22 Oct 2019 at 16:55, Mohan Boddu <mboddu@xxxxxxxxxx> wrote:
>
> On Mon, Oct 21, 2019 at 2:48 PM Kevin Fenzi <kevin@xxxxxxxxx> wrote:
> >
> > So, I got db-koji02 all setup and doing streaming replication from
> > db-koji01. It has no trouble keeping up.
> >
> > The bad news is, it doesn't solve the backup problem.
> >
> > * If I just do a backup on db-koji02 and don't have hot_standby_feedback
> > on, the backup fails because the primary (01) vacuums or otherwise drops
> > rows that the backup is trying to read.
> >
> > * If I do a backup on db-koji02 with hot_standby_feedback enabled, the
> > backup fails due to a 30 second timeout in streaming delay. ie,
> > 02: "hey, I am backing up these rows, keep them for me until I'm done"
> > 01: "sure, no problem"
> > 02: ...backs up... 30 seconds pass, still backing up...
> > 02: woah, I am 30s behind in replication now due to this lock. <boom>
> >
> > Longer term the problem is really going to need to be solved in koji.
> > There's 2 tables that are gigantic: buildroot_listing (203GB) and tasks
> > (93GB). Those tables need to have lots of things inserting into them, so
> > backing them up will always slow things down. Hopefully they can use
> > partitioning or something and fix this upstream.
> >
> > Possible solutions until they do:
> >
> > 1. I can try and increase max_standby_streaming_delay from 30s to say
> > 90s. This might allow it enough time to copy things with all the locks.
> +1
> >
> > 2. We could try breaking the replication, doing the backup on 02 and
> > reconnecting it. This may be difficult to automate and I worry that it
> > might not be consistent if we just disconnect.
> >
> > 3. We could just say that having a hot spare is ok for now and kick the
> > can down the road and not worry about backups.
> >
> > Personally, I am willing to try 1 (will need more +1's for freeze
> > break), I don't really like 2 and 3 seems a bit scary, but I guess it
> > could be ok until after freeze.
> >
> > Thoughts?
> I prefer #1 compared to #2 and #3.

I think for the next week while Kevin is on vacation.. #1 is what we
should go with.

-- 
Stephen J Smoogen.
_______________________________________________
infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx