Hi sweeties,
I'm a bit confused about this one, and could use some help from you PG overlords,
so I have a RDS 9.5 master 16vcpu/64GB RAM/5000 IOPS which is RDS replicating to a replica with the same specs.
The load on the master is roughly 20/30% cpu and there's only some batch queries running on the replica usually in the morning.
Now this master DB was neglected for a long time, and we had tables with 1.3 billion records. I ran an outbound daemon that was deleting chunks of this table over a period of weeks, and it is now cleaned up. Autovacuum on these amounts of deletes is a beast on its own, and right now is running what I hope is the last long running one.
So I had been trying for a couple of days to get an accurate count of the number of records remaining on that large table that still needed to be deleted,
select count(id) from large_table where id < 99999;
I am running the above on the replica to avoid causing an extra load on the master, that query takes a long time (lets ignore the fact that it badly needs an analyse to finish), roughly an hour or so.
Now what is baffling me is the CPU load on the master goes up steadily all the way to 100% while this select count is running on the slave. Worth mentioning that CPU on the slave increases by about 10% of so.
I have google, read the postgres parameters, dug into AWS forums, stack overflows and I'm none the wiser about this one.
How can a select count like this one on the replica cause CPU to spike on the master?
thanks for your help postgres overlords
your humble pebble
- azul