How to solve my slow disk i/o throughput during index scan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello everyone,

I have a database of sufficient size that it does not fit entirely in RAM, including indexes that also exceed RAM capacity. Most of my big fact tables are following the same scheme :

  • 3-5 id columns.
  • Partitioned along one ID (ID_JOUR).
  • 10 – 100 Go partitions.
  • 1-5 Go primary key indexes, for each partition
  • Contains 10 – 120 columns, some of them with a lot of NaNs. All of them are int or float8. Some columns are containing a lot of NaNs, because they were not all created and added to daily processing at the same date
  • Requesting a small subset (100k-1M) lines from this database, always filtering on the primary key (I have a WHERE … filtering on each ID column. Some are done through dimension tables + join, but I tried doing those directly, it did not solve my problem).
  • See in the annex a DDL code for one of those tables.
  • See the annex for the size of my different tables.
  • Stable data (I am focusing on past data, rare to no insertions, VACUUM + CLUSTER done after those rare modifications).
  • Played REINDEX, VACUUM, CLUSTER, ANALYZE on those tables.

 

When performing queries, I observe significant differences in processing time depending on whether the index needs to be read from disk or is already loaded in RAM. I think I have confirmed using EXPLAIN ANALYZE that the issue stems from index scans. See for example :

 

 

I measured the speed of loading my index into RAM during a query, which is approximately 2 to 3 MB/s. However, my infrastructure theoretically supports I/O speed of around 900 MB/s.


On some older partitions, I was able to sometimes get better throughputs (see e.g. https://explain.dalibo.com/plan/4db409d1d6d95d4b)


I do not understand why reading my index from disk is so slow. I suspect that the index is not sequentially read, but I do not know how postgresql internals really behave, so this is just a supposition.

 

My question is : what can I change to get a better index reading speed ?

 

What I already tried :

 

  • Setting random_page_cost to prohibitive value (10000000) to force a bitmap heap scan, because those can be made in parallel. This has not worked, the optimizer is still doing an index scan on my fact table.
  • Change effective_io_concurrency, max_parallel_workers_per_gather, work_mem to much higher values.


Thank you in advance for your help, any idea/advice greatly appreciated !

Simon F.

ANNEX :

Some more details about my environment :

 

 

==> /etc/lsb-release <==

DISTRIB_ID=Ubuntu

DISTRIB_RELEASE=20.04

DISTRIB_CODENAME=focal

DISTRIB_DESCRIPTION="Ubuntu 20.04 LTS"

 

==> /etc/os-release <==

ID=ubuntu

ID_LIKE=debian

PRETTY_NAME="Ubuntu 20.04 LTS"

VERSION_ID="20.04"

HOME_URL=https://www.ubuntu.com/

SUPPORT_URL=https://help.ubuntu.com/

BUG_REPORT_URL=https://bugs.launchpad.net/ubuntu/

PRIVACY_POLICY_URL=https://www.ubuntu.com/legal/terms-and-policies/privacy-policy

VERSION_CODENAME=focal

UBUNTU_CODENAME=focal

 

==> /etc/SNCF_release <==

HOSTNAME       : uzcordbr05

OS_NAME        : UBUNTU 20.04

OS_DESCRIPTION : Ubuntu 20.04 LTS

OS_RELEASE     : 20.04

OS_CODENAME    : focal

CMDB_ENV       : Recette

AENV           : hprod

REPO_IT        : https://repos.it.sncf.fr/repos/os/ubuntu/bugfix/dists/focal-bugfix/Release

 

  • Default requests settings :
    • effective_cache_size = '96GB',
    • effective_io_concurrency = '200',
    • max_parallel_workers_per_gather = '4',
    • random_page_cost = '1.1',
    • search_path = 'public',
    • work_mem = '64MB' à I tried to change work_mem to 4GB, did not change anything.

 

  • Postgres custom configuration settings

 

name                            |current_setting                                                                                                                                          |source              |

--------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+

application_name                |DBeaver 21.2.5 - SQLEditor <Script-19.sql>                                                                                                               |session             |

archive_command                 |(disabled)                                                                                                                                               |configuration file  |

archive_mode                    |off                                                                                                                                                      |configuration file  |

archive_timeout                 |2h                                                                                                                                                       |configuration file  |

autovacuum_analyze_scale_factor |0.05                                                                                                                                                     |configuration file  |

autovacuum_analyze_threshold    |50                                                                                                                                                       |configuration file  |

autovacuum_max_workers          |6                                                                                                                                                        |configuration file  |

autovacuum_naptime              |15s                                                                                                                                                      |configuration file  |

autovacuum_vacuum_cost_delay    |10ms                                                                                                                                                     |configuration file  |

autovacuum_vacuum_cost_limit    |-1                                                                                                                                                       |configuration file  |

autovacuum_vacuum_scale_factor  |0.01                                                                                                                                                     |configuration file  |

autovacuum_vacuum_threshold     |50                                                                                                                                                       |configuration file  |

autovacuum_work_mem             |512MB                                                                                                                                                    |configuration file  |

checkpoint_completion_target    |0.9                                                                                                                                                      |configuration file  |

client_encoding                 |UTF8                                                                                                                                                     |client              |

cluster_name                    |irdbr010                                                                                                                                                 |configuration file  |

DateStyle                       |ISO, DMY                                                                                                                                                 |client              |

default_text_search_config      |pg_catalog.french                                                                                                                                        |configuration file  |

effective_cache_size            |96GB                                                                                                                                                     |configuration file  |

effective_io_concurrency        |200                                                                                                                                                      |database            |

extra_float_digits              |3                                                                                                                                                        |session             |

lc_messages                     |C                                                                                                                                                        |configuration file  |

lc_monetary                     |fr_FR.UTF8                                                                                                                                               |configuration file  |

lc_numeric                      |fr_FR.UTF8                                                                                                                                               |configuration file  |

lc_time                         |fr_FR.UTF8                                                                                                                                               |configuration file  |

listen_addresses                |*                                                                                                                                                        |configuration file  |

log_autovacuum_min_duration     |0                                                                                                                                                        |configuration file  |

log_checkpoints                 |on                                                                                                                                                       |configuration file  |

log_connections                 |on                                                                                                                                                       |configuration file  |

log_disconnections              |off                                                                                                                                                      |configuration file  |

log_file_mode                   |0640                                                                                                                                                     |configuration file  |

log_line_prefix                 |%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h                                                                                                           |configuration file  |

log_lock_waits                  |on                                                                                                                                                       |configuration file  |

log_min_duration_statement      |5s                                                                                                                                                       |configuration file  |

log_min_error_statement         |warning                                                                                                                                                  |configuration file  |

log_statement                   |ddl                                                                                                                                                      |configuration file  |

log_temp_files                  |0                                                                                                                                                        |configuration file  |

log_timezone                    |Europe/Paris                                                                                                                                             |configuration file  |

logging_collector               |on                                                                                                                                                       |configuration file  |

maintenance_work_mem            |4GB                                                                                                                                                      |configuration file  |

max_connections                 |500                                                                                                                                                      |configuration file  |

max_locks_per_transaction       |1024                                                                                                                                                     |configuration file  |

max_parallel_workers_per_gather |4                                                                                                                                                        |configuration file  |

max_stack_depth                 |2MB                                                                                                                                                      |environment variable|

max_wal_size                    |4GB                                                                                                                                                      |configuration file  |

min_wal_size                    |128MB                                                                                                                                                    |configuration file  |

password_encryption             |scram-sha-256                                                                                                                                            |configuration file  |

pg_stat_statements.max          |15000                                                                                                                                                    |configuration file  |

pg_stat_statements.save         |off                                                                                                                                                      |configuration file  |

pg_stat_statements.track        |all                                                                                                                                                      |configuration file  |

pg_stat_statements.track_utility|off                                                                                                                                                      |configuration file  |

port                            |5433                                                                                                                                                     |configuration file  |

random_page_cost                |1.1                                                                                                                                                      |database            |

restore_command                 |/home/postgres/admin/bin/pgbackrest --config=/etc/pgbackrest.conf --pg1-path=/home/postgres/data/irdbr010/systeme --stanza=rdb_backup archive-get %f "%p"|configuration file  |

search_path                     |public, public, temporaire, dtm_2019                                                                                                                     |session             |

shared_buffers                  |32GB                                                                                                                                                     |configuration file  |

ssl                             |on                                                                                                                                                       |configuration file  |

statement_timeout               |0                                                                                                                                                        |user                |

tcp_keepalives_count            |10                                                                                                                                                       |configuration file  |

tcp_keepalives_idle             |900                                                                                                                                                      |configuration file  |

tcp_keepalives_interval         |75                                                                                                                                                       |configuration file  |

TimeZone                        |Europe/Paris                                                                                                                                             |client              |

unix_socket_group               |postgres                                                                                                                                                 |configuration file  |

unix_socket_permissions         |0700                                                                                                                                                     |configuration file  |

wal_buffers                     |16MB                                                                                                                                                     |configuration file  |

work_mem                        |64MB                                                                                                                                                     |configuration file  |



DDL for my one of my table :


-- public."F_TDOJ_HIST_1" definition

 

-- Drop table

 

-- DROP TABLE public."F_TDOJ_HIST_1";

 

CREATE TABLE public."F_TDOJ_HIST_1" (

       "ID_TRAIN" int4 NOT NULL,

       "ID_JOUR" int4 NOT NULL,

       "ID_OD" int4 NOT NULL,

       "JX" int4 NOT NULL,

       "RES" int4 NULL,

       "REV" float8 NULL,

       "OFFRE" int4 NULL,

       "CC_OUV" int4 NULL,

       "GENV_NUM" int8 NULL,

       "GENV_DEN" int8 NULL,

       "GENR_NUM" int8 NULL,

       "GENR_DEN" int8 NULL,

       "GENH_NUM" int8 NULL,

       "GENH_DEN" int8 NULL,

       "RES_CC0" int4 NULL,

       "RES_CC1" int4 NULL,

       "RES_CC2" int4 NULL,

       "RES_CC3" int4 NULL,

       "RES_CC4" int4 NULL,

       "RES_CC5" int4 NULL,

       "RES_CC6" int4 NULL,

       "RES_CC7" int4 NULL,

       "RES_CC8" int4 NULL,

       "RES_CC9" int4 NULL,

       "RES_CC10" int4 NULL,

       "RES_CC11" int4 NULL,

       "RES_CC12" int4 NULL,

       "RES_CC13" int4 NULL,

       "RES_CC14" int4 NULL,

       "RES_CC15" int4 NULL,

       "RES_CC16" int4 NULL,

       "RES_CC17" int4 NULL,

       "RES_CC18" int4 NULL,

       "RES_CC19" int4 NULL,

       "RES_CC20" int4 NULL,

       "AUT_CC0" int4 NULL,

       "AUT_CC1" int4 NULL,

       "AUT_CC2" int4 NULL,

       "AUT_CC3" int4 NULL,

       "AUT_CC4" int4 NULL,

       "AUT_CC5" int4 NULL,

       "AUT_CC6" int4 NULL,

       "AUT_CC7" int4 NULL,

       "AUT_CC8" int4 NULL,

       "AUT_CC9" int4 NULL,

       "AUT_CC10" int4 NULL,

       "AUT_CC11" int4 NULL,

       "AUT_CC12" int4 NULL,

       "AUT_CC13" int4 NULL,

       "AUT_CC14" int4 NULL,

       "AUT_CC15" int4 NULL,

       "AUT_CC16" int4 NULL,

       "AUT_CC17" int4 NULL,

       "AUT_CC18" int4 NULL,

       "AUT_CC19" int4 NULL,

       "AUT_CC20" int4 NULL,

       "DSP_CC0" int4 NULL,

       "DSP_CC1" int4 NULL,

       "DSP_CC2" int4 NULL,

       "DSP_CC3" int4 NULL,

       "DSP_CC4" int4 NULL,

       "DSP_CC5" int4 NULL,

       "DSP_CC6" int4 NULL,

       "DSP_CC7" int4 NULL,

       "DSP_CC8" int4 NULL,

       "DSP_CC9" int4 NULL,

       "DSP_CC10" int4 NULL,

       "DSP_CC11" int4 NULL,

       "DSP_CC12" int4 NULL,

       "DSP_CC13" int4 NULL,

       "DSP_CC14" int4 NULL,

       "DSP_CC15" int4 NULL,

       "DSP_CC16" int4 NULL,

       "DSP_CC17" int4 NULL,

       "DSP_CC18" int4 NULL,

       "DSP_CC19" int4 NULL,

       "DSP_CC20" int4 NULL,

       "REV_CC0" float8 NULL,

       "REV_CC1" float8 NULL,

       "REV_CC2" float8 NULL,

       "REV_CC3" float8 NULL,

       "REV_CC4" float8 NULL,

       "REV_CC5" float8 NULL,

       "REV_CC6" float8 NULL,

       "REV_CC7" float8 NULL,

       "REV_CC8" float8 NULL,

       "REV_CC9" float8 NULL,

       "REV_CC10" float8 NULL,

       "REV_CC11" float8 NULL,

       "REV_CC12" float8 NULL,

       "REV_CC13" float8 NULL,

       "REV_CC14" float8 NULL,

       "REV_CC15" float8 NULL,

       "REV_CC16" float8 NULL,

       "REV_CC17" float8 NULL,

       "REV_CC18" float8 NULL,

       "REV_CC19" float8 NULL,

       "REV_CC20" float8 NULL,

       "RES_CHD" int4 NULL,

       "REV_CHD" float8 NULL,

       "RES_PRO" int4 NULL,

       "REV_PRO" float8 NULL,

       "RES_SOC" int4 NULL,

       "REV_SOC" float8 NULL,

       "RES_TMX" int4 NULL,

       "REV_TMX" float8 NULL,

       "RES_MAX" int4 NULL,

       "RES_GRP" int4 NULL,

       "REV_GRP" float8 NULL,

       "PREV_RES" int4 NULL,

       "PREV_REV" float8 NULL,

       "OPTIM_RES" int4 NULL,

       "OPTIM_REV" float8 NULL,

       "RES_FFX" int4 NULL,

       "REV_FFX" float8 NULL,

       "RES_SFE" int4 NULL,

       "REV_SFE" float8 NULL,

       "RES_SFN" int4 NULL,

       "REV_SFN" float8 NULL,

       "RES_NFE" int4 NULL,

       "REV_NFE" float8 NULL,

       "RES_NFN" int4 NULL,

       "REV_NFN" float8 NULL,

       "RES_ABO" int4 NULL,

       "REV_ABO" float8 NULL,

       "RES_AGN" int4 NULL,

       "REV_AGN" float8 NULL,

       "RES_BPR" int4 NULL,

       "REV_BPR" float8 NULL,

       "RES_LIB" int4 NULL,

       "REV_LIB" float8 NULL,

       "RES_FFN" int4 NULL,

       "REV_FFN" float8 NULL,

       "RES_PRI" int4 NULL,

       "REV_PRI" float8 NULL,

       CONSTRAINT "F_TDOJ_HIST_1_OLDP_pkey" PRIMARY KEY ("ID_TRAIN", "ID_JOUR", "ID_OD", "JX")

)

PARTITION BY RANGE ("ID_JOUR");

CREATE INDEX "F_TDOJ_HIST_1_OLDP_ID_JOUR_JX_idx" ON ONLY public."F_TDOJ_HIST_1" USING btree ("ID_JOUR", "JX");

CREATE INDEX "F_TDOJ_HIST_1_OLDP_ID_JOUR_idx" ON ONLY public."F_TDOJ_HIST_1" USING btree ("ID_JOUR");

CREATE INDEX "F_TDOJ_HIST_1_OLDP_ID_OD_idx" ON ONLY public."F_TDOJ_HIST_1" USING btree ("ID_OD");

CREATE INDEX "F_TDOJ_HIST_1_OLDP_ID_TRAIN_idx" ON ONLY public."F_TDOJ_HIST_1" USING btree ("ID_TRAIN");

CREATE INDEX "F_TDOJ_HIST_1_OLDP_JX_idx" ON ONLY public."F_TDOJ_HIST_1" USING btree ("JX");

 

 

-- public."F_TDOJ_HIST_1" foreign keys

 

ALTER TABLE public."F_TDOJ_HIST_1" ADD CONSTRAINT "F_TDOJ_HIST_1_OLDP_ID_JOUR_fkey" FOREIGN KEY ("ID_JOUR") REFERENCES public."D_JOUR"("ID_JOUR");

ALTER TABLE public."F_TDOJ_HIST_1" ADD CONSTRAINT "F_TDOJ_HIST_1_OLDP_ID_OD_fkey" FOREIGN KEY ("ID_OD") REFERENCES public."D_OD"("ID_OD");

ALTER TABLE public."F_TDOJ_HIST_1" ADD CONSTRAINT "F_TDOJ_HIST_1_OLDP_ID_TRAIN_fkey" FOREIGN KEY ("ID_TRAIN") REFERENCES public."D_TRAIN"("ID_TRAIN");

ALTER TABLE public."F_TDOJ_HIST_1" ADD CONSTRAINT "F_TDOJ_HIST_1_OLDP_JX_pkey" FOREIGN KEY ("JX") REFERENCES public."D_JX"("JX");

 

 

 

Result of :

SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname in (

    'F_TDLJ_HIST_1', 'F_TDLJ_HIST_2', 'F_TDLJ_HIST', 'F_TDOJ_HIST_1', 'F_TDOJ_HIST_2', 'F_TDOJ_HIST'

);



relpages|reltuples|relname      |relallvisible|relkind|relnatts|relhassubclass|reloptions|pg_table_size|

--------+---------+-------------+-------------+-------+--------+--------------+----------+-------------+

       0|      0.0|F_TDLJ_HIST  |            0|p      |      47|true          |NULL      |            0|

   95806| 442969.0|F_TDLJ_HIST  |        95806|r      |      47|false         |NULL      |    785080320|

       0|      0.0|F_TDLJ_HIST_1|            0|p      |     129|true          |NULL      |            0|

  197458| 730226.0|F_TDLJ_HIST_1|       157954|r      |     129|false         |NULL      |   1618059264|

       0|      0.0|F_TDLJ_HIST_2|            0|p      |     159|true          |NULL      |            0|

  278359| 441524.0|F_TDLJ_HIST_2|       278359|r      |     159|false         |NULL      |   2280972288|

       0|      0.0|F_TDOJ_HIST  |            0|p      |     104|true          |NULL      |            0|

  311913|1424975.0|F_TDOJ_HIST  |       311913|r      |      56|false         |NULL      |   2555928576|

       0|      0.0|F_TDOJ_HIST_1|            0|p      |     135|true          |NULL      |            0|

  682522|1241940.0|F_TDOJ_HIST_1|       682522|r      |     135|false         |NULL      |   5592793088|

       0|      0.0|F_TDOJ_HIST_2|            0|p      |     163|true          |NULL      |            0|

  661324|1397598.0|F_TDOJ_HIST_2|       661324|r      |     163|false         |NULL      |   5419098112|


Interne

-------
Ce message et toutes les pièces jointes sont établis à l'intention exclusive de ses destinataires et sont confidentiels. L'intégrité de ce message n'étant pas assurée sur Internet, la SNCF ne peut être tenue responsable des altérations qui pourraient se produire sur son contenu. Toute publication, utilisation, reproduction, ou diffusion, même partielle, non autorisée préalablement par la SNCF, est strictement interdite. Si vous n'êtes pas le destinataire de ce message, merci d'en avertir immédiatement l'expéditeur et de le détruire.
-------
This message and any attachments are intended solely for the addressees and are confidential. SNCF may not be held responsible for their contents whose accuracy and completeness cannot be guaranteed over the Internet. Unauthorized use, disclosure, distribution, copying, or any part thereof is strictly prohibited. If you are not the intended recipient of this message, please notify the sender immediately and delete it.

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux