Re: Disk wait problem... may not be hardware...

pf@xxxxxxxxxxx · Sat, 28 Oct 2023 13:00:33 -0400

On Fri, 27 Oct 2023 21:21:18 -0700 Adrian Klaver wrote:

>On 10/27/23 16:46, pf@xxxxxxxxxxx wrote:
>> Peter,
>> 
>> Thanks for your patience; I've been feeling pressure to get this
>> resolved; so have been lax in providing info here..  Hope the following
>> helps...
>>   
>
>
>> Something I hadn't noticed before: SQL-workbench/J (build 129.6) displays
>> an execution timer at the bottom of the window.  I see all the queries
>> reporting 0-47ms which it plenty fast. It's the results that often take a
>> long time to display.  
>
>Which is not surprising as building GUI elements is an expensive 
>operation. 

True; but not to this extent (minutes).

>If the time to display is your wait issue then this is not 
>really a disk issue. 

Guessing you didn't look at attachment "journal2"... it clearly shows
data from the disk coming out VERY slowly...

>What happens if you use psql as the client?

I see my reply on the 23rd only went to you (I'm still forgetting to
check which message I'm replying to -- this is unlike the other lists I
participate in, where the headers are set to force all messages through
the list server), my reply contained the psql info where psql shows long
delays too:

Hi Adrian,

Sorry about everything inline; I think it saves time vs dealing with
attachments...

Summary:
* single large partition dedicated to PG
* only PG seems affected
* buffered reads are 6x slower when PG queries are in progress
* psql queries of 30  33M [row] tables shows random results up to 4m51s
* successful psql queries below 500ms
* ls -l loop on another directory had no delays.

HTH...

The WB query was in disk-wait, so psql gave:
ncsbe=# select count(*) from ncvhis_2022_10_29;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

The above query finished a second or two before the WB results
appeared; then this next query succeeded right after:
ncsbe=# select count(*) from
ncvhis_2022_10_29; count   
----------
 33861176
(1 row)

Another table went into disk-wait long enough that I was able to query
the tmp directory on the same disk just fine. 

Querying the same table; no error, canceled it:
ncsbe=# select count(*) from ncvhis_2022_07_23;
^CCancel request sent
ERROR:  canceling statement due to user request

While this table was still in disk-wait, tried another table which
returned quickly:
ncsbe=# select count(*) from ncvhis_2023_10_21;
  count   
----------
 32983343
(1 row)

Eventually, the WB query did complete...

Next, I made a query list for 30 tables, turned on timing and pasted the
queries.  Pasting results inline:
ncsbe=# select count(*) from ncvhis_2012_12_06;
select count(*) from ncvhis_2016_12_03;
select count(*) from ncvhis_2020_08_01;
select count(*) from ncvhis_2020_10_31;
select count(*) from ncvhis_2020_12_05;
select count(*) from ncvhis_2020_12_26;
select count(*) from ncvhis_2021_03_06;
select count(*) from ncvhis_2021_06_12;
select count(*) from ncvhis_2022_07_23;
select count(*) from ncvhis_2022_10_15;
select count(*) from ncvhis_2022_10_22;
select count(*) from ncvhis_2022_10_29;
select count(*) from ncvhis_2022_11_05;
select count(*) from ncvhis_2022_11_12;
select count(*) from ncvhis_2022_11_19;
select count(*) from ncvhis_2022_11_26;
select count(*) from ncvhis_2022_12_03;
select count(*) from ncvhis_2022_12_10;
select count(*) from ncvhis_2022_12_17;
select count(*) from ncvhis_2022_12_24;
select count(*) from ncvhis_2022_12_31;
select count(*) from ncvhis_2023_01_08;
select count(*) from ncvhis_2023_01_14;
select count(*) from ncvhis_2023_01_21;
select count(*) from ncvhis_2023_01_28;
select count(*) from ncvhis_2023_02_04;
select count(*) from ncvhis_2023_02_11;
select count(*) from ncvhis_2023_02_18;
select count(*) from ncvhis_2023_02_27;
select count(*) from ncvhis_2023_03_04;
  count   
----------
 31923950
(1 row)

Time: 72404.786 ms (01:12.405)
  count   
----------
 29701168
(1 row)

Time: 301.246 ms
  count   
----------
 32172845
(1 row)

Time: 409.974 ms
  count   
----------
 32162748
(1 row)

Time: 363.836 ms
  count   
----------
 34679651
(1 row)

Time: 351.167 ms
  count   
----------
 35006711
(1 row)

Time: 348.378 ms
  count   
----------
 35003995
(1 row)

Time: 348.712 ms
  count   
----------
 34994502
(1 row)

Time: 351.901 ms
  count   
----------
 33686292
(1 row)

Time: 487.837 ms
  count   
----------
 33861658
(1 row)

Time: 40987.826 ms (00:40.988)
  count   
----------
 33861381
(1 row)

Time: 76964.281 ms (01:16.964)
  count   
----------
 33861176
(1 row)

Time: 483.329 ms
  count   
----------
 33861071
(1 row)

Time: 18919.267 ms (00:18.919)
  count   
----------
 29344354
(1 row)

Time: 50896.978 ms (00:50.897)
  count   
----------
 30709227
(1 row)

Time: 25784.000 ms (00:25.784)
  count   
----------
 32368001
(1 row)

Time: 45407.599 ms (00:45.408)
  count   
----------
 33126826
(1 row)

Time: 75171.609 ms (01:15.172)
 count   
----------
 33126808
(1 row)

Time: 88871.004 ms (01:28.871)
  count   
----------
 33127317
(1 row)

Time: 128565.127 ms (02:08.565)
 count   
----------
 33127192
(1 row)

Time: 210248.222 ms (03:30.248)
  count   
----------
 33127117
(1 row)

Time: 246609.561 ms (04:06.610)   4 minutes!  ;p
  count   
----------
 33124457
(1 row)

Time: 166406.283 ms (02:46.406)
  count   
----------
 33123714
(1 row)

Time: 291483.538 ms (04:51.484)  nearly 5 minutes ;p ;p
  count   
----------
 33123313
(1 row)

Time: 2269.961 ms (00:02.270)
  count   
----------
 33123197
(1 row)

Time: 413.219 ms
  count   
----------
 33123091
(1 row)

Time: 433.817 ms
  count   
----------
 33122953
(1 row)

Time: 395.844 ms
  count   
----------
 33122752
(1 row)

Time: 7251.637 ms (00:07.252)
  count   
----------
 33122665
(1 row)

Time: 384.328 ms
  count   
----------
 33122573
(1 row)

Time: 384.887 ms
ncsbe=# 

I don't see a pattern in the above; matches the randomness I saw using
WB...

Before the above finished, I issued this command on another konsole...

$ while true; do ls -l > /tmp/ll; date; done
Mon Oct 23 05:12:54 PM EDT 2023
Mon Oct 23 05:12:54 PM EDT 2023
Mon Oct 23 05:12:54 PM EDT 2023
Mon Oct 23 05:12:54 PM EDT 2023
Mon Oct 23 05:12:54 PM EDT 2023
Mon Oct 23 05:12:54 PM EDT 2023
Mon Oct 23 05:12:54 PM EDT 2023
Mon Oct 23 05:12:54 PM EDT 2023
Mon Oct 23 05:12:54 PM EDT 2023
Mon Oct 23 05:12:54 PM EDT 2023
Mon Oct 23 05:12:54 PM EDT 2023
Mon Oct 23 05:12:54 PM EDT 2023
...
Lots of high speed output with no pauses, at 64 loops per second.

So that means only PG is getting disk waits.  

This disk is dedicated to PG and a rarely used semi-permanent tmp
directory on a single partition.

This is from fdisk -l:
Disk /dev/nvme1n1: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: Seagate FireCuda 530 ZP4000GM30023      
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 9B3C5D73-6FFD-475D-9E1E-26F43B41C690

Device         Start        End    Sectors  Size Type
/dev/nvme1n1p1  2048 7814035455 7814033408  3.6T Linux filesystem

$ hdparm -Tt /dev/nvme1n1
/dev/nvme1n1:
 Timing cached reads:   36820 MB in  2.00 seconds = 18434.72 MB/sec
 Timing buffered disk reads: 774 MB in  3.00 seconds = 257.74 MB/sec

The 30 queries still hadn't finished at this point...

Interesting: the hdparm buffered read is very slow; once the queries
finished, the disk performance is back:
$ hdparm -Tt /dev/nvme1n1

/dev/nvme1n1:
 Timing cached reads:   37382 MB in  2.00 seconds = 18715.85 MB/sec
 Timing buffered disk reads: 5002 MB in  3.17 seconds = 1575.89 MB/sec

Over 6x faster...

So far, the only disk waits I see are PG related.  Do have any opinion on
Ken's comment about parallel queries?

Thanks,
Pierre