RE: How to solve my slow disk i/o throughput during index scan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Also, It might not be related, but I have suspiciously similar slow reads when I am inserting in database, could it be related ?

See e.g. : https://explain.dalibo.com/plan/43d37de5870e1651

The table I am inserting into looks like :

-- public."F_TDOJ_SC_HIST_2" definition

 

-- Drop table

 

-- DROP TABLE public."F_TDOJ_SC_HIST_2";

 

CREATE TABLE public."F_TDOJ_SC_HIST_2" (

       "ID_TRAIN" int4 NOT NULL,

       "ID_JOUR" int4 NOT NULL,

       "ID_OD" int4 NOT NULL,

       "JX" int4 NOT NULL,

       "RES" int4 NULL,

       "REV" float4 NULL,

       "RES_SC1" int4 NULL,

       "RES_SC2" int4 NULL,

       "RES_SC3" int4 NULL,

       "RES_SC4" int4 NULL,

       "RES_SC5" int4 NULL,

       "RES_SC6" int4 NULL,

       "RES_SC7" int4 NULL,

       "RES_SC8" int4 NULL,

       "RES_SC9" int4 NULL,

       "RES_SC10" int4 NULL,

       "RES_SC11" int4 NULL,

       "RES_SC12" int4 NULL,

       "RES_SC13" int4 NULL,

       "RES_SC14" int4 NULL,

       "RES_SC15" int4 NULL,

       "RES_SC16" int4 NULL,

       "RES_SC17" int4 NULL,

       "RES_SC18" int4 NULL,

       "AUT_SC1" int4 NULL,

       "AUT_SC2" int4 NULL,

       "AUT_SC3" int4 NULL,

       "AUT_SC4" int4 NULL,

       "AUT_SC5" int4 NULL,

       "AUT_SC6" int4 NULL,

       "AUT_SC7" int4 NULL,

       "AUT_SC8" int4 NULL,

       "AUT_SC9" int4 NULL,

       "AUT_SC10" int4 NULL,

       "AUT_SC11" int4 NULL,

       "AUT_SC12" int4 NULL,

       "AUT_SC13" int4 NULL,

       "AUT_SC14" int4 NULL,

       "AUT_SC15" int4 NULL,

       "AUT_SC16" int4 NULL,

       "AUT_SC17" int4 NULL,

       "AUT_SC18" int4 NULL,

       "DSP_SC1" int4 NULL,

       "DSP_SC2" int4 NULL,

       "DSP_SC3" int4 NULL,

       "DSP_SC4" int4 NULL,

       "DSP_SC5" int4 NULL,

       "DSP_SC6" int4 NULL,

       "DSP_SC7" int4 NULL,

       "DSP_SC8" int4 NULL,

       "DSP_SC9" int4 NULL,

       "DSP_SC10" int4 NULL,

       "DSP_SC11" int4 NULL,

       "DSP_SC12" int4 NULL,

       "DSP_SC13" int4 NULL,

       "DSP_SC14" int4 NULL,

       "DSP_SC15" int4 NULL,

       "DSP_SC16" int4 NULL,

       "DSP_SC17" int4 NULL,

       "DSP_SC18" int4 NULL,

       "REV_SC1" float4 NULL,

       "REV_SC2" float4 NULL,

       "REV_SC3" float4 NULL,

       "REV_SC4" float4 NULL,

       "REV_SC5" float4 NULL,

       "REV_SC6" float4 NULL,

       "REV_SC7" float4 NULL,

       "REV_SC8" float4 NULL,

       "REV_SC9" float4 NULL,

       "REV_SC10" float4 NULL,

       "REV_SC11" float4 NULL,

       "REV_SC12" float4 NULL,

       "REV_SC13" float4 NULL,

       "REV_SC14" float4 NULL,

       "REV_SC15" float4 NULL,

       "REV_SC16" float4 NULL,

       "REV_SC17" float4 NULL,

       "REV_SC18" float4 NULL,

       CONSTRAINT "F_TDOJ_SC_HIST_2_pkey" PRIMARY KEY ("ID_TRAIN","ID_JOUR","ID_OD","JX")

)

PARTITION BY RANGE ("ID_JOUR");

 

 

-- public."F_TDOJ_SC_HIST_2" foreign keys

 

ALTER TABLE public."F_TDOJ_SC_HIST_2" ADD CONSTRAINT "F_TDOJ_SC_HIST_2_ID_JOUR_fkey" FOREIGN KEY ("ID_JOUR") REFERENCES public."D_JOUR"("ID_JOUR");

ALTER TABLE public."F_TDOJ_SC_HIST_2" ADD CONSTRAINT "F_TDOJ_SC_HIST_2_ID_OD_fkey" FOREIGN KEY ("ID_OD") REFERENCES public."D_OD"("ID_OD");

ALTER TABLE public."F_TDOJ_SC_HIST_2" ADD CONSTRAINT "F_TDOJ_SC_HIST_2_ID_TRAIN_fkey" FOREIGN KEY ("ID_TRAIN") REFERENCES public."D_TRAIN"("ID_TRAIN");

ALTER TABLE public."F_TDOJ_SC_HIST_2" ADD CONSTRAINT "F_TDOJ_SC_HIST_2_JX_fkey" FOREIGN KEY ("JX") REFERENCES public."D_JX"("JX");

I’m using a 3 steps process to insert my lines in the table :

  • COPY into a temporary table
  • DELETE FROM on the perimeter I will be inserting into
  • INSERT … INTO mytable SELECT … FROM temporarytable ON CONFLICT DO NOTHING

 

Is it possible to parallelize the scans during the modify step ?

 

Regards

Simon FREYBURGER

 


Interne

De :
FREYBURGER Simon (SNCF VOYAGEURS / DIRECTION GENERALE TGV / DM RMP YIELD MANAGEMENT) <simon.freyburger@xxxxxxx>
Envoyé : jeudi 11 juillet 2024 16:59
À : Andrei Lepikhov <lepihov@xxxxxxxxx>; pgsql-performance@xxxxxxxxxxxxxxxxxxxx; Peter Geoghegan <pg@xxxxxxx>
Objet : RE: How to solve my slow disk i/o throughput during index scan

 


Hello, and thank you again for your example !

Sorry for my late answer, I was working on a patch for our requests. I am though not completely understanding what is happening. Here is a plan of a query where I splitted the calls with OR as you suggested, what seemed to have enabled parallel scans.

https://explain.dalibo.com/plan/gfa1cf9fffd01bcg#plan/node/1

But, I still wonder, why was my request that slow ? My current understanding of what happened is :

 

  • When postgresql does an Index Scan, it goes through a loop (which is not parallel) of asking for a chunk of data, and then processing it. It wait for having processed the data to ask the next chunk, instead of loading the whole index in RAM (which, I suppose, would be much faster, but also not feasible if the index is too big and the RAM too small, so postgresql does not do it). Thus, the 2MB/s.
  • When it does a Bitmap Index Scan, it can parallelize disk interactions, and does not use the processor to discard lines, thus a much faster index load and processing.

 

I might be completely wrong, and would really like to understand the details, in order to explain them to my team, and to other who might encounter the same problem.  

Again, thank you very much for your help, we were really struggling with those slow queries !

Simon FREYBURGER

 

 

Interne

De : Andrei Lepikhov <lepihov@xxxxxxxxx>
Envoyé : vendredi 5 juillet 2024 04:05
À : FREYBURGER Simon (SNCF VOYAGEURS / DIRECTION GENERALE TGV / DM RMP YIELD MANAGEMENT) <simon.freyburger@xxxxxxx>; pgsql-performance@xxxxxxxxxxxxxxxxxxxx; Peter Geoghegan <pg@xxxxxxx>
Objet : Re: How to solve my slow disk i/o throughput during index scan

 

On 7/4/24 22: 23, FREYBURGER Simon (SNCF VOYAGEURS / DIRECTION GENERALE TGV / DM RMP YIELD MANAGEMENT) wrote: > Hello, > > Thank you, splitting in “OR” query definitely enables bitmap heap scans, > and thus parallelized read to disk

 

 

Interne

 
On 7/4/24 22:23, FREYBURGER Simon (SNCF VOYAGEURS / DIRECTION GENERALE 
TGV / DM RMP YIELD MANAGEMENT) wrote:
> Hello,
> 
> Thank you, splitting in “OR” query definitely enables bitmap heap scans, 
> and thus parallelized read to disk 😃! I though did not understand your 
> second point, what is parallel append, and how to enable it ?
Just for example:
 
DROP TABLE IF EXISTS t CASCADE;
CREATE TABLE t (id int not null, payload text) PARTITION BY RANGE (id);
CREATE TABLE p1 PARTITION OF t FOR VALUES FROM (0) TO (1000);
CREATE TABLE p2 PARTITION OF t FOR VALUES FROM (1000) TO (2000);
CREATE TABLE p3 PARTITION OF t FOR VALUES FROM (2000) TO (3000);
CREATE TABLE p4 PARTITION OF t FOR VALUES FROM (3000) TO (4000);
INSERT INTO t SELECT x % 4000, repeat('a',128) || x FROM 
generate_series(1,1E5) AS x;
ANALYZE t;
 
SET enable_parallel_append = on;
SET parallel_setup_cost = 0.00001;
SET parallel_tuple_cost = 0.00001;
SET max_parallel_workers_per_gather = 8;
SET min_parallel_table_scan_size = 0;
SET min_parallel_index_scan_size = 0;
 
EXPLAIN (COSTS OFF)
SELECT t.id, t.payload FROM t WHERE t.id % 2 = 0
GROUP BY t.id, t.payload;
 
  Group
    Group Key: t.id, t.payload
    ->  Gather Merge
          Workers Planned: 6
          ->  Sort
                Sort Key: t.id, t.payload
                ->  Parallel Append
                      ->  Parallel Seq Scan on p1 t_1
                            Filter: ((id % 2) = 0)
                      ->  Parallel Seq Scan on p2 t_2
                            Filter: ((id % 2) = 0)
                      ->  Parallel Seq Scan on p3 t_3
                            Filter: ((id % 2) = 0)
                      ->  Parallel Seq Scan on p4 t_4
                            Filter: ((id % 2) = 0)
 
Here the table is scanned in parallel. It also works with IndexScan.
 
-- 
regards, Andrei Lepikhov
 

-------
Ce message et toutes les pièces jointes sont établis à l'intention exclusive de ses destinataires et sont confidentiels. L'intégrité de ce message n'étant pas assurée sur Internet, la SNCF ne peut être tenue responsable des altérations qui pourraient se produire sur son contenu. Toute publication, utilisation, reproduction, ou diffusion, même partielle, non autorisée préalablement par la SNCF, est strictement interdite. Si vous n'êtes pas le destinataire de ce message, merci d'en avertir immédiatement l'expéditeur et de le détruire.
-------
This message and any attachments are intended solely for the addressees and are confidential. SNCF may not be held responsible for their contents whose accuracy and completeness cannot be guaranteed over the Internet. Unauthorized use, disclosure, distribution, copying, or any part thereof is strictly prohibited. If you are not the intended recipient of this message, please notify the sender immediately and delete it.


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux