We have a cloud service that uses PostgreSQL to temporarily store binary content. We're using PostgreSQL's Large Objects to store the binary content. Each large object lives anywhere from a few hundred milliseconds to 5-10 minutes, after which it is deleted. Normally, this works just fine and we have no issues. However, roughly every 3 months or so, disk usage on our PostgreSQL database starts to increase at an alarming rate. More specifically, the pg_largeobject table's disk usage steadily increases even though we've deleted most of the large objects. This will continue until the database server runs out of disk space (and then all sorts of bad things happen of course). It looks to us like autovacuum against the pg_largeobject table stops working because the pg_stat_all_tables.last_autovacuum column stops updating, or is updated infrequently, while it is still updated on other tables: schemaname | relname | n_live_tup | n_dead_tup | last_autovacuum ------------+-------------------------+------------+------------+------------------------------- pg_catalog | pg_largeobject | 37205 | 92614852 | 2020-06-15 01:55:09.037101+00 pg_toast | pg_toast_973434724 | 281 | 3701 | 2020-06-15 17:22:54.746452+00 pg_catalog | pg_largeobject_metadata | 320 | 1236 | 2020-06-15 17:23:49.765878+00 pg_catalog | pg_shdepend | 344 | 933 | 2020-06-15 17:23:54.448855+00 pg_toast | pg_toast_2619 | 52 | 85 | 2020-06-15 17:22:01.387933+00 pg_catalog | pg_statistic | 413 | 100 | 2020-06-15 17:15:52.656791+00 pg_catalog | pg_class | 349 | 2 | 2020-06-12 17:58:13.147931+00 pg_catalog | pg_attribute | 2633 | 1 | 2020-06-12 17:58:13.000114+00 Looking at our PostgreSQL logs, it looks like the autovacuum task takes increasingly more time to run (against the pg_largeobject table): 2020-06-12T19:41:58.335931494Z stderr F system usage: CPU: user: 0.02 s, system: 0.00 s, elapsed: 1.77 s 2020-06-12T19:42:59.704884752Z stderr F system usage: CPU: user: 0.02 s, system: 0.02 s, elapsed: 3.06 s 2020-06-12T19:44:01.928300512Z stderr F system usage: CPU: user: 0.06 s, system: 0.01 s, elapsed: 5.44 s 2020-06-12T19:45:14.124209167Z stderr F system usage: CPU: user: 0.11 s, system: 0.05 s, elapsed: 17.13 s 2020-06-12T19:46:16.28758936Z stderr F system usage: CPU: user: 0.13 s, system: 0.08 s, elapsed: 19.04 s 2020-06-12T19:47:34.264882626Z stderr F system usage: CPU: user: 0.20 s, system: 0.19 s, elapsed: 36.22 s 2020-06-12T19:49:15.383436343Z stderr F system usage: CPU: user: 0.28 s, system: 0.38 s, elapsed: 74.06 s 2020-06-12T19:53:47.229361981Z stderr F system usage: CPU: user: 0.66 s, system: 1.06 s, elapsed: 214.12 s 2020-06-12T20:19:39.619748109Z stderr F system usage: CPU: user: 4.13 s, system: 5.30 s, elapsed: 1461.16 s 2020-06-12T21:30:31.634637945Z stderr F system usage: CPU: user: 13.73 s, system: 19.59 s, elapsed: 4225.61 s 2020-06-12T23:54:32.511015886Z stderr F system usage: CPU: user: 33.57 s, system: 41.90 s, elapsed: 8514.23 s 2020-06-13T04:23:32.230960572Z stderr F system usage: CPU: user: 66.09 s, system: 82.95 s, elapsed: 16011.25 s 2020-06-13T12:00:37.43434175Z stderr F system usage: CPU: user: 99.42 s, system: 130.48 s, elapsed: 27296.71 s 2020-06-14T02:40:25.19122979Z stderr F system usage: CPU: user: 202.96 s, system: 263.66 s, elapsed: 52653.66 s 2020-06-15T01:55:09.03766272Z stderr F system usage: CPU: user: 317.54 s, system: 544.48 s, elapsed: 83550.21 s We have several instances of our cloud service, and each instance has it's own database. On other instances, the autovacuum elapsed is consistently less than 25 seconds on every run. Other than the increasing elapsed times for the autovacuum, we don't see any other indication in the logs of a problem (no error messages, etc). We're currently using PostgreSQL version 10.10. Our service is JVM-based and we're using the PostgreSQL JDBC driver version 42.2.5. Have we stumbled upon a potential bug here, or do we need to tweak some autovacuum settings? What should we look at next or what should we try next to further troubleshoot this? Regards, Jim Hurne