Hi Rene, On 11/19/18 8:46 PM, Rene Romero
Benavides wrote:
Valid point to raise. Partitioning is in the works, but none at the moment.
Transaction times on the master max out around two minutes. On
the replica they are much longer -- numerous 1 - 2 hour
transactions per day, and occasional ones as long as 10 - 20
hours. Isolation levels are read committed everywhere.
I'll make a note to record the active locks next time. I haven't
seen anything unusual in the logs during these incidents, but have
observed statements getting canceled at other times, which is why
I think the config mostly works.
This brings up a good detail I forgot to mention originally.
During the last incident, IO utilization on the replica was near
100%, and had been for several hours, which I believe was due to
the long queries I canceled. Now that I think about it, I wonder
if the lag may have arisen from IO contention between the query
and WAL replay, rather than a query conflict. Thanks, interesting reading.
|