Looking at the Azure portal metric, we are nowhere close to the advertised maximum IOPS or MB/s throughput (under half of the maximum IOPS and under a quarter of the MB/s maximum). So there must be some other bottleneck in play. The IOPS limit on this VM size is even higher so that shouldn't be it.If I were to size a separate volume for just WAL, I would think 64GB but obviously the Azure storage IOPS are much less. On this particular DB host, we're currently on a 2.0T P40 disk that is supposed to give 7500 IOPS and 250MB/s [1] (but again, Azure's own usage graphs show us nowhere near those limits). A smaller volume like 64GB would be provisioned at 240 IOPS in this example. Doesn't seem like a lot. Really until you get a terabyte it seems like a real drop-off as far as Azure storage goes.
Based on those metrics, I would start looking at other things. For example, I once (it was years ago) experienced commit delays because the kernel cache on Linux was getting over run. Do you have any metrics on the system as a whole? Perhaps sar running every few minutes will help you identify a correlation?
JD
I'd be interested to hear what others might have configured on their write-heavy cloud databases.Don.--Don Seiler
www.seiler.us