[SOLVED?] Re: Disk wait problem... not hardware...

pf@xxxxxxxxxxx · Sun, 29 Oct 2023 09:21:46 -0400

On Sat, 28 Oct 2023 18:34:50 -0400 Jim Mlodgenski wrote:

Looking like a GOLD star for Jim...

>On Fri, Oct 27, 2023 at 7:46 PM <pf@xxxxxxxxxxx> wrote:
>
>> Memory: 125.5 GiB of RAM
>>  
>It looks like you have a large amount of memory allocated to the server
>
>But your plans are doing reads instead of pulling things from shared
>buffers

Assuming you are referring to "Buffers: shared hit=248 read=25022" ?

>>explain (analyze, buffers) select count(*) from ncvhis_2016_12_03;
>> Finalize Aggregate  (cost=404669.65..404669.66 rows=1 width=8) (actual
>> time=844.158..847.309 rows=1 loops=1) Buffers: shared hit=248 read=25022  
>>   ->  Gather  (cost=404669.43..404669.65 rows=2 width=8) (actual  
>> time=844.133..847.301 rows=3 loops=1) Workers Planned: 2
>>         Workers Launched: 2
>>         Buffers: shared hit=248 read=25022  
>>         ->  Partial Aggregate  (cost=403669.43..403669.45 rows=1
>> width=8) (actual time=838.772..838.772 rows=1 loops=3) Buffers: shared
>> hit=248 read=25022 ->  Parallel Index Only Scan using
>> ncvhis_2016_12_03_voted_party_cd_idx on ncvhis_2016_12_03
>> (cost=0.44..372735.05 rows=12373755 width=0) (actual
>> time=18.277..592.473 rows=9900389 loops=3) Heap Fetches: 0 Buffers:
>> shared hit=248 read=25022 Planning Time: 0.069 ms JIT:
>>   Functions: 8
>>   Options: Inlining false, Optimization false, Expressions true,
>> Deforming true Timing: Generation 0.284 ms, Inlining 0.000 ms,
>> Optimization 0.268 ms, Emission 3.590 ms, Total 4.143 ms Execution
>> Time: 847.498 ms
>>
>>  
>
>
>data/postgresql.conf:
>> max_connections = 100
>> shared_buffers = 128MB

I semi-misquoted; it was running on 128MB when first reported; then 512MB
when I sent this message.

>It looks like you are running with the stock config for shared_buffers.
>With only 128MB dedicated for shared_buffers and such a big database,
>you'll be thrashing the cache pretty hard. With 125GB on the server, try
>upping shared_buffers to something more like 32GB.

WOW!  Some tables now have a barely noticeable delay; on the whole,
looking MUCH better. Testing with a just created copy[1] on the NVMe,
it's better; but it's slowly going back to longer delays. Looks like
count(*) may have easily-missed side-effects (more below). My team
members were suggesting a possible motherboard issue since the problem
happened on both the NVMe M.2 SSD, and on a backup SATA-3 platter drive.
Your suggestion arrived just in time as I was about to wipe out the SSD
and re-import everything fresh.  

Another thing that bothers me is that when I copied the DB, the:
   cp -a ./var ./var2   # same drive, same partition
replicating the DB onto free space on the same SSD, it was experiencing
slow reads 18-20KB/sec; the writes were quick, large blocks; so I was
suspecting something with the SSD...  System buffers..?

How autoload row count (or count(*)) can hurt:

The queries which triggered this issue are all the same (WB built-in):
   SELECT count(*) FROM table;    # WB = SQL-workbench/J

These are all static tables. Does PG maintain a table row count so as to
avoid having to count each time?  WB is setup to:
* autoload table row count
* autoload table data (restricted with LIMIT)

Walking through these 8M & 33M row tables (3.8GB & 5.2GB respectively)
must be filling the shared_buffers and thrashing even 32GB...

Ah-ha! Turning off "autoload table row count" has "resolved" this issue. I
can now scroll through all the tables at high speed (hold down the arrow
key to walk through all the tables. Not very useful; but walking through
the tables with only 128MB shared-buffers would have been the ultimate
buffer thrashing... ;p  Even walking through the tables to collect their
row counts (whether manually or programmatically. ).

That said, the 32GB shared-buffers may now be masking NVMe drive issues...

Thanks!!!,
Pierre

Time and complexity to find the cause of an issue is inversely
proportional to the simplicity of the issue...