On 2024-Sep-10, James Pang wrote: > Hi experts, > we have a Postgresql v14.8 database, almost thousands of backends hang > on MultiXactOffsetSLRU at the same time, all of these sessions running same > query "SELECT ....", from OS and postgresql slow log, we found all of these > query on "BIND" stage. > LOG: duration: 36631.688 ms bind S_813: SELECT > LOG: duration: 36859.786 ms bind S_1111: SELECT > LOG: duration: 35868.148 ms bind <unnamed>: SELECT > LOG: duration: 36906.471 ms bind <unnamed>: SELECT > LOG: duration: 35955.489 ms bind <unnamed>: SELECT > LOG: duration: 36833.510 ms bind <unnamed>: SELECT > LOG: duration: 36839.535 ms bind S_1219: SELECT > ... > > this database hang on MultiXactOffsetSLRU and MultiXactOffsetBuffer long > time. > > could you direct me why they are hanging on 'BIND‘ stage with > MultiXactOffsetSLRU ? Very likely, it's related to this problem [1] https://thebuild.com/blog/2023/01/18/a-foreign-key-pathology-to-avoid/ This is caused by a suboptimal implementation of what we call SLRU, which multixact uses underneath. For years, many people dodged this problem by recompiling with a changed value for NUM_MULTIXACTOFFSET_BUFFERS in src/include/access/multixact.h (it was originally 8 buffers, which is very small); you'll need to do that in all releases up to pg16. In pg17 this was improved[2] and you'll be able to change the value in postgresql.conf, though the default already being larger than the original (16 instead of 8), you may not need to. [2] https://pgconf.in/files/presentations/2023/Dilip_Kumar_RareExtremelyChallengingPostgresPerformanceProblems.pdf [3] https://www.pgevents.ca/events/pgconfdev2024/schedule/session/53-problem-in-postgresql-slru-and-how-we-are-optimizing-it/ -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "La victoria es para quien se atreve a estar solo"