Hi,
We're running PostgreSQL as essentially a data warehouse, and we have a few thousand roles, which are used to grant permissions on a table-by-table basis to a few thousand users, so a user would typically have say between 1 and 2 thousand roles. There is also quite a lot of "churn" in terms of tables being created/removed, and permissions changed.
The issue is that we're hitting a strange performance problem on connection. Sometimes it can take ~25 to 40 seconds just to connect, although it's often way quicker. There seems to be no middle ground - never have I seen a connection take between 0.5 and 25 seconds for example. We suspect it's related to the number of roles the connecting user has (including via other roles), because if we remove all roles but one from the connecting user (the one that grants connection permissions), connecting is always virtually instantaneous.
The closest issue that I can find that's similar is https://www.postgresql.org/message-id/flat/CAGvXd3OSMbJQwOSc-Tq-Ro1CAz%3DvggErdSG7pv2s6vmmTOLJSg%40mail.gmail.com, which reports that GRANT role is slow with a high number of roles - but in our case, it's connecting that's the problem, before (as far as we can tell) even one query is run. The database is busy, say up to 60-80% on a 16 VCPU machine - even if it's a "good amount" below 100%, the issue occurs.
Is there anything we can do to investigate (or hopefully fix!) the issue?
Thanks,
Michal
------
We would like to connect to the database - expect it to connect in less than 1 second, but sometimes 25 - 40s.
PostgreSQL version number you are running:
PostgreSQL version number you are running:
PostgreSQL 14.10 on aarch64-unknown-linux-gnu, compiled by aarch64-unknown-linux-gnu-gcc (GCC) 9.5.0, 64-bit
How you installed PostgreSQL:
How you installed PostgreSQL:
Via AWS/Amazon Aurora
Changes made to the settings in the postgresql.conf file
Changes made to the settings in the postgresql.conf file
In attached CSV file
Operating system and version:
Operating system and version:
Unknown
What program you're using to connect to PostgreSQL:
What program you're using to connect to PostgreSQL:
Python + SQLAlchemy, psql, or also via Amazon Quicksight (Unsure which client they use under the hood, but it surfaces connection timeout errors, which we suspect is due to the issue described above)
Is there anything relevant or unusual in the PostgreSQL server logs?:
Is there anything relevant or unusual in the PostgreSQL server logs?:
No
For questions about any kind of error:
N/A
For questions about any kind of error:
N/A
name,current_setting,source apg_write_forward.connect_timeout,30,configuration file apg_write_forward.consistency_mode,session,configuration file apg_write_forward.idle_in_transaction_session_timeout,86400000,configuration file apg_write_forward.idle_session_timeout,300000,configuration file apg_write_forward.max_forwarding_connections_percent,25,configuration file archive_command,(disabled),configuration file archive_mode,off,configuration file archive_timeout,5min,configuration file aurora_compute_plan_id,on,configuration file aurora_stat_plans.calls_until_recapture,0,configuration file aurora_stat_plans.minutes_until_recapture,0,configuration file aurora_stat_plans.with_analyze,off,configuration file aurora_stat_plans.with_buffers,off,configuration file aurora_stat_plans.with_costs,on,configuration file aurora_stat_plans.with_timing,on,configuration file aurora_stat_plans.with_triggers,off,configuration file aurora_stat_plans.with_wal,off,configuration file autovacuum_analyze_scale_factor,0.05,configuration file autovacuum_max_workers,3,configuration file autovacuum_naptime,5s,configuration file autovacuum_vacuum_cost_delay,5ms,configuration file autovacuum_vacuum_cost_limit,1800,configuration file autovacuum_vacuum_insert_scale_factor,0.2,configuration file autovacuum_vacuum_scale_factor,0.1,configuration file autovacuum_work_mem,4074385kB,configuration file buffer_cache_mode,dynamic,configuration file buffer_table_mode,fixed,configuration file checkpoint_timeout,1min,configuration file client_encoding,UTF8,configuration file compute_query_id,auto,configuration file database_instance_type,provisioned,configuration file effective_cache_size,88325320kB,configuration file hot_standby,off,configuration file hot_standby_feedback,on,configuration file huge_pages,on,configuration file idle_in_transaction_session_timeout,1h,user krb_caseins_users,off,configuration file listen_addresses,*,command line lo_compat_privileges,off,configuration file log_autovacuum_min_duration,10s,configuration file log_destination,"stderr,csvlog",configuration file log_file_mode,0644,configuration file log_hostname,off,configuration file log_line_prefix,%t:%r:%u@%d:[%p]:,configuration file log_rotation_age,1h,configuration file log_rotation_size,100000kB,configuration file log_timezone,UTC,configuration file log_truncate_on_rotation,off,configuration file logging_collector,on,configuration file maintenance_io_concurrency,1,configuration file maintenance_work_mem,2087MB,configuration file max_connections,5000,configuration file max_locks_per_transaction,64,configuration file max_parallel_workers,8,configuration file max_prepared_transactions,0,configuration file max_replication_slots,20,configuration file max_stack_depth,6MB,configuration file max_standby_streaming_delay,14s,configuration file max_sync_workers_per_subscription,2,configuration file max_wal_senders,20,configuration file max_worker_processes,32,configuration file min_wal_size,512MB,configuration file password_encryption,md5,configuration file pgaudit.log,"ALL, -MISC",user pgaudit.log_catalog,off,user pgaudit.log_statement,on,configuration file port,5432,configuration file rds.delegated_extension_allow_drop_cascade,off,configuration file rds.extensions,"address_standardizer, address_standardizer_data_us, amcheck, apg_plan_mgmt, aurora_stat_utils, autoinc, aws_commons, aws_lambda, aws_ml, aws_s3, babelfishpg_common, babelfishpg_money, babelfishpg_tds, babelfishpg_telemetry, babelfishpg_tsql, bloom, bool_plperl, btree_gin, btree_gist, citext, cube, dblink, dict_int, dict_xsyn, earthdistance, fuzzystrmatch, h3, h3_postgis, hll, hstore, hstore_plperl, hypopg, insert_username, intagg, intarray, ip4r, isn, jsonb_plperl, lo, log_fdw, ltree, moddatetime, mysql_fdw, old_snapshot, oracle_fdw, orafce, pg_ad_mapping, pg_bigm, pg_buffercache, pg_cron, pg_freespacemap, pg_hint_plan, pg_partman, pg_prewarm, pg_proctab, pg_repack, pg_similarity, pg_stat_statements, pg_tle, pg_trgm, pg_visibility, pgaudit, pgcrypto, pglogical, pglogical_origin, pgrouting, pgrowlocks, pgstattuple, pgtap, plcoffee, plls, plperl, plpgsql, plprofiler, pltcl, plv8, postgis, postgis_raster, postgis_tiger_geocoder, postgis_topology, postgres_fdw, prefix, rdkit, rds_activity_stream, rds_tools, refint, seg, sslinfo, tablefunc, tcn, tds_fdw, tsm_system_rows, tsm_system_time, unaccent, uuid-ossp, vector",configuration file rds.force_autovacuum_logging_level,warning,configuration file rds.internal_databases,rdsadmin,configuration file rds.local_volume_spill_enabled,on,configuration file rds.logical_replication,off,configuration file rds.rds_superuser_reserved_connections,2,configuration file rds.rdsadmin_max_connections,100,configuration file rds.run_logical_replication_as_subscription_owner,off,configuration file rds.superuser_variables,session_replication_role,configuration file rds.survivable_readers,on,configuration file recovery_init_sync_method,syncfs,configuration file remove_temp_files_after_crash,off,configuration file shared_buffers,11040665,configuration file ssl,on,configuration file ssl_ca_file,/rdsdbdata/rds-metadata/ca-cert.pem,configuration file ssl_cert_file,/rdsdbdata/rds-metadata/server-cert.pem,configuration file ssl_crl_dir,/rdsdbdata/rds-metadata/ssl_crl_dir/,configuration file ssl_key_file,/rdsdbdata/rds-metadata/server-key.pem,configuration file statement_timeout,15min,session superuser_reserved_connections,3,configuration file synchronous_commit,on,configuration file temp_file_limit,-1,configuration file TimeZone,UTC,configuration file track_activity_query_size,4kB,configuration file track_functions,pl,configuration file track_io_timing,on,configuration file unix_socket_group,rdsdb,configuration file unix_socket_permissions,0706,command line vacuum_cost_page_miss,0,configuration file vacuum_failsafe_age,1200000000,configuration file vacuum_multixact_failsafe_age,1200000000,configuration file wal_level,replica,configuration file wal_receiver_create_temp_slot,off,configuration file wal_receiver_timeout,30s,configuration file