On Tue, Apr 25, 2017 at 1:17 AM, Andres Freund <andres@xxxxxxxxxxx> wrote: > Questions (answer as many you can comfortably answer): > - How many MB/s, segments/s do you see on busier servers? ~20MB/s with FPW compression, with peaks of ~35MB/s. Writes become the bottleneck without compression and it tops at about 40-50MB/s, WAL archiving cannot keep up beyond that point. > - What generates the bulk of WAL on your servers (9.5+ can use > pg_xlogdump --stats to compute that)? Type N (%) Record size (%) FPI size (%) Combined size (%) ---- - --- ----------- --- -------- --- ------------- --- XLOG 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) Transaction 30 ( 0.00) 960 ( 0.00) 0 ( 0.00) 960 ( 0.00) Storage 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) CLOG 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) Database 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) Tablespace 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) MultiXact 110 ( 0.01) 7456 ( 0.02) 0 ( 0.00) 7456 ( 0.00) RelMap 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) Standby 2 ( 0.00) 368 ( 0.00) 0 ( 0.00) 368 ( 0.00) Heap2 2521 ( 0.22) 78752 ( 0.24) 4656133 ( 2.82) 4734885 ( 2.39) Heap 539419 ( 46.52) 15646903 ( 47.14) 98720258 ( 59.87) 114367161 ( 57.73) Btree 606573 ( 52.31) 15872182 ( 47.82) 57514798 ( 34.88) 73386980 ( 37.05) Hash 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) Gin 2866 ( 0.25) 134330 ( 0.40) 4012251 ( 2.43) 4146581 ( 2.09) Gist 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) Sequence 7970 ( 0.69) 1450540 ( 4.37) 0 ( 0.00) 1450540 ( 0.73) SPGist 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) BRIN 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) CommitTs 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) ReplicationOrigin 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) 0 ( 0.00) -------- -------- -------- -------- Total 1159491 33191491 [16.76%] 164903440 [83.24%] 198094931 [100%] > - Are you seeing WAL writes being a bottleneck?OA Sometimes, more so without FPW compression > - What kind of backup methods are you using and is the WAL volume a > problem? > - What kind of replication are you using and is the WAL volume a > problem? Streaming to hot standby + WAL archiving, delayed standby as backup and PITR. Backups are regular filesystem-level snapshots of the delayed standby (with postgres down to get consistent snapshots). WAL volume getting full during periods where the hot standby lags behind (or when we have to stop it to create consistent snapshots) are an issue indeed, and we've had to provision significant storage to be able to absorb those peaks (1TB of WAL) We bundle WAL segments into groups of 256 segments for archiving and recovery to minimize the impact of TCP slow start. We further gzip segments before transfer with pigz, and we use mostly rsync (with a wrapper script that takes care of durability and error handling) to move segments around. Getting the archive/recovery scripts to handle the load hasn't been trivial. > - What are your settings for wal_compression, max_wal_size (9.5+) / > checkpoint_segments (< 9.5), checkpoint_timeout and wal_buffers? wal_compression = on max_wal_size = 12GB min_wal_size = 2GB checkpoint_timeout = 30min wal_buffers = -1 (16MB effective) > - Could you quickly describe your workload? Steady stream of (preaggregated) input events plus upserts into ~12 partitioned aggregate "matviews" (within quotes since they're manually maintained up to date). Input rate is approximately 9000 rows/s without counting the upserts onto the aggregate matviews. Old information is regularly compressed and archived into less detailed partitions for a steady size of about 5TB. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance