On Sat, 2024-05-25 at 12:51 +0200, Peter wrote: > I just found Autovacuum run for 6 hours on a 8 GB table, VACUUM query > doesnt cancel, cluster doesn't stop, autovacuum worker is not > killable, truss shows no activity, after kill -6 this backtrace: > > * thread #1, name = 'postgres', stop reason = signal SIGABRT > * frame #0: 0x0000000000548063 postgres`HeapTupleSatisfiesVacuumHorizon + 531 > frame #1: 0x000000000054aed9 postgres`heap_page_prune + 537 > frame #2: 0x000000000054e38a postgres`heap_vacuum_rel + 3626 > frame #3: 0x00000000006af382 postgres`vacuum_rel + 626 > frame #4: 0x00000000006aeeeb postgres`vacuum + 1611 > frame #5: 0x00000000007b4664 postgres`do_autovacuum + 4292 > frame #6: 0x00000000007b2342 postgres`AutoVacWorkerMain + 866 > frame #7: 0x00000000007b1f97 postgres`StartAutoVacWorker + 39 > frame #8: 0x00000000007ba0df postgres`sigusr1_handler + 783 > frame #9: 0x00000008220da627 libthr.so.3`___lldb_unnamed_symbol683 + 215 > frame #10: 0x00000008220d9b1a libthr.so.3`___lldb_unnamed_symbol664 + 314 > frame #11: 0x00007ffffffff913 > frame #12: 0x00000000007bba25 postgres`ServerLoop + 1541 > frame #13: 0x00000000007b9467 postgres`PostmasterMain + 3207 > frame #14: 0x000000000071a566 postgres`main + 758 > frame #15: 0x00000000004f9995 postgres`_start + 261 > > After restart, no problems reported yet. > > Storyline: > this is the file-list table of my backup/archive system, contains ~50 > mio records. Recently I found a flaw in the backup system, so that some > old records weren't removed. I wrote a script to do this, that script > did run first at 04:15 and reported it had now removed a lot of old > data. I looked into pgadmin4 and it reported 9 mio dead tuples. This smells of index corruption. I have seen cases where a corrupted index sends VACUUM into an endless loop so that it does not react to query cancellation. Check the index with the "bt_index_check()" function from the "amcheck" extension. If that reports a problem, rebuild the index. Of course, as always, try to figure out how that could happen. Apart from hardware problems, one frequent cause is upgrading glibc (if the index on a string column or expression). Yours, Laurenz Albe