G'day all, A quick follow-up on this issue for interest's sake. The stalling we were seeing turned out to be a Cloud SQL issue and not related to our listen/notify usage. Cloud SQL has an automatic storage increase process that resizes the underlying disk as required to account for cluster growth. As it turns out that process occasionally causes I/O to stall for a brief window. https://cloud.google.com/sql/docs/postgres/instance-settings#automatic-storage-increase-2ndgen The workaround supplied by Google is to manually provision slack storage in larger increments to prevent the more frequent automatic increases, which happen 25GB at a time on a large cluster. We didn't make the connection because disk resize events are not visible in any logs; Google Support found the issue by correlating the timestamps of our observed outages with their internal logs. Hopefully this is useful for someone else. Thanks again for your help Tom - your advice on listen/notify locking on commit was very useful despite not being the cause in this case. Cheers Ben On Mon, 1 Feb 2021 at 12:33, Ben Hoskings <ben@xxxxxxxxxxxx> wrote: > > On Mon, 1 Feb 2021 at 10:33, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > > > > One thing that just occurred to me is that you might find it > > interesting to keep tabs on what's in the $PGDATA/pg_notify > > directory. Do the performance burps correspond to transitory > > peaks in the amount of data there? Or (grasping at straws here...) > > wraparound of the file names back to 0000? > > We don't have filesystem access on Cloud SQL - the downside of the > managed route :) > > It sounds like it might be time to bump the pg13 upgrade up the TODO list. > > Cheers > Ben