Hello, we run multiple versions of PostgreSQL instances on production. Some time ago we add new physical servers and decided to go with latest GA from pgdg APT repository, that is PostgreSQL 16. We encounter slow `GRANT ROLES` only on PostgreSQL 16 instances up to 42 seconds in production, the client process at PostgresSQL would use 100% of the CPU. Which is a surprise compared to other instances running older PostgreSQL releases. On production we have a *LOT* of ROLEs, which unfortunately a case that we did not test before switching the new servers into production mode. The Application & ROLEs ----------------------- Our application make use of ROLEs. We create group ROLEs for each tenant of our application, these ROLEs are named with `d_` and `a_` prefix. A special ROLE, called `acc`, it will be a member to each of these `d_` and `a_` ROLEs. The application have a concept of "session", which it would mantain and I think outside the scope of this e-mail. In relation to PostgreSQL, the application would create a PostgreSQL ROLE that corresponds to its own (application) session. It would name these ROLEs with `s_` prefix, which CREATEd and GRANTed its permission on every application's "session". When an application "session" started, user with `acc` ROLE would grant membersip of `d_` ROLE to `s_` ROLE (ie. GRANT ROLE `d_xxxx` TO `s_xxxx`;) To make this clear, for example, we have (say) role `d_202402` already existing and application would create a new ROLE `s_0000001` which corresponds to application's "session". Application that connects with special ROLE `acc` would GRANT ROLE `d_202402` to the ROLE `s_0000001`, like so: GRANT d_202402 TO s_0000001; In production we have up to 13 thousands of these ROLEs, each: $ sudo -u postgres psql -p 5531 psql (16.2 (Debian 16.2-1.pgdg120+2)) Type "help" for help. postgres=# select count(*) s_roles_count from pg_catalog.pg_authid where rolname like 's_%'; s_roles_count --------------- 13299 (1 row) postgres=# select count(*) a_roles_count from pg_catalog.pg_authid where rolname like 'a_%'; a_roles_count --------------- 12776 (1 row) postgres=# select count(*) d_roles_count from pg_catalog.pg_authid where rolname like 'd_%'; d_roles_count --------------- 13984 (1 row) The Setup --------- Investigating this slow `GRANT ROLE` we start a VM running Debian 11, and create a lot of roles. create special `acc` role and write to some file: $ echo -e "CREATE ROLE acc WITH LOGIN NOSUPERUSER INHERIT CREATEDB CREATEROLE NOREPLICATION;\n\n" > create_acc.sql create a lot of `a_` roles and make sure `acc` is member of each one of them: $ for idx1 in $(seq -w 1 100); do for idx2 in $(seq -w 1 12); do for idx3 in $(seq -w 1 10); do echo "CREATE ROLE a_${idx1}${idx2}${idx3} WITH NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN;"; echo "GRANT a_${idx1}${idx2}${idx3} TO acc WITH ADMIN OPTION;"; done; done; done > create_a.sql create a lot of `d_` roles and make sure `acc` is member of each one of them: $ for idx1 in $(seq -w 1 100); do for idx2 in $(seq -w 1 12); do for idx3 in $(seq -w 1 10); do echo "CREATE ROLE d_${idx1}${idx2}${idx3} WITH NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN;"; echo "GRANT d_${idx1}${idx2}${idx3} TO acc WITH ADMIN OPTION;"; done; done; done > create_d.sql create a lot of `s_` roles: $ for idx1 in $(seq -w 1 100); do for idx2 in $(seq -w 1 12); do for idx3 in $(seq -w 1 10); do echo "CREATE ROLE s_${idx1}${idx2}${idx3} WITH NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN;"; done; done; done > create_s.sql merge ROLE creation into one file: $ cat create_acc.sql create_a.sql create_d.sql create_s.sql > /tmp/create-roles.sql PostgreSQL 16 ------------- Install PostgreSQL 16: -- $ sudo sh -c 'echo "deb https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list' $ sudo apt install gnupg2 $ wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add - $ sudo apt-get update $ sudo apt-get -y install postgresql-16 postgresql-client-16 Create PostgreSQL 16 instance: -- $ sudo pg_dropcluster --stop 16 main # drop default Debian cluster $ sudo pg_createcluster 16 pg16 $ echo "local all acc trust" | sudo tee /etc/postgresql/16/pg16/pg_hba.conf $ echo "local all postgres peer" | sudo tee -a /etc/postgresql/16/pg16/pg_hba.conf $ sudo systemctl start postgresql@16-pg16.service Import lots of roles: -- $ sudo -u postgres /usr/lib/postgresql/16/bin/psql -f /tmp/create-roles.sql -p 5432 -d postgres Using ROLE `acc`, grant `d_` ROLE to a session ROLE: -- $ time sudo -u postgres /usr/lib/postgresql/16/bin/psql -U acc postgres -c 'GRANT d_0010109 TO s_0010109;' GRANT ROLE real 0m7.579s user 0m0.054s sys 0m0.020s This is the surprising behavior for PostgreSQL 16. It seems there's a new logic in PostgreSQL that checks against each role, and it took 100% of CPU. At this point we know `acc` is just another ROLE that happens to have ADMIN privilege that is a member of `d_0010109` group ROLE. But what happens when `acc` is a SUPERUSER? Alter role `acc` as SUPERUSER: -- $ sudo -u postgres /usr/lib/postgresql/16/bin/psql -c 'ALTER ROLE acc WITH SUPERUSER' ALTER ROLE This is a workaround to make GRANT ROLE bearable. Using ROLE `acc`, grant `d_` ROLE to a session ROLE, again: -- $ time sudo -u postgres /usr/lib/postgresql/16/bin/psql -U acc postgres -c 'GRANT d_0010108 TO s_0010108;' GRANT ROLE real 0m0.079s user 0m0.054s sys 0m0.019s OK this is fast. But what hapens when `acc` is back being not a SUPERUSER? Alter role `acc` to stop being SUPERUSER: -- $ sudo -u postgres psql -c 'ALTER ROLE acc WITH NOSUPERUSER' ALTER ROLE Using ROLE `acc`, grant `d_` ROLE to a session ROLE with `acc` not a SUPERUSER: -- $ time sudo -u postgres /usr/lib/postgresql/16/bin/psql -U acc postgres -c 'GRANT d_0010107 TO s_0010107;' GRANT ROLE real 0m7.741s user 0m0.055s sys 0m0.021s As expected, slow `GRANT ROLE` again. At this point, we try with PostgreSQL 15 just to make sure that this is new to PostgreSQL 16. $ sudo systemctl stop postgresql@16-pg16 PostgreSQL 15 ------------- Install PostgreSQL 15: -- $ sudo apt-get update $ sudo apt-get -y install postgresql-15 postgresql-client-15 $ sudo pg_dropcluster --stop 15 main # drop default Debian cluster $ sudo pg_createcluster 15 pg15 $ echo "local all acc trust" | sudo tee /etc/postgresql/15/pg15/pg_hba.conf $ echo "local all postgres peer" | sudo tee -a /etc/postgresql/15/pg15/pg_hba.conf $ sudo systemctl start postgresql@15-pg15.service Import lots of roles: -- $ sudo -u postgres /usr/lib/postgresql/15/bin/psql -f /tmp/create-roles.sql -p 5433 -d postgres Using ROLE `acc`, grant `d_` ROLE to a session ROLE: -- $ time sudo -u postgres /usr/lib/postgresql/15/bin/psql -U acc -p 5433 postgres -c 'GRANT d_0010109 TO s_0010109;' GRANT ROLE real 0m0.077s user 0m0.054s sys 0m0.017s Seems OK with the same amount of ROLEs. The `acc` ROLE is not a SUPERUSER here. Alter role `acc` as SUPERUSER: -- $ sudo -u postgres /usr/lib/postgresql/15/bin/psql -p 5433 -c 'ALTER ROLE acc WITH SUPERUSER' ALTER ROLE Using ROLE `acc`, grant `d_` ROLE to a session ROLE, again: -- $ time sudo -u postgres /usr/lib/postgresql/15/bin/psql -p 5433 -U acc postgres -c 'GRANT d_0010108 TO s_0010108;' GRANT ROLE real 0m0.084s user 0m0.057s sys 0m0.021s Doesn't matter, GRANT ROLE works still as fast. Alter role `acc` to stop being a SUPERUSER: -- $ sudo -u postgres /usr/lib/postgresql/15/bin/psql -p 5433 -c 'ALTER ROLE acc WITH NOSUPERUSER' ALTER ROLE Using ROLE `acc`, grant `d_` ROLE to a session ROLE with `acc` not a SUPERUSER: -- $ time sudo -u postgres /usr/lib/postgresql/15/bin/psql -p 5433 -U acc postgres -c 'GRANT d_0010107 TO s_0010107;' GRANT ROLE real 0m0.077s user 0m0.054s sys 0m0.017s Again, doesn't matter, GRANT ROLE works still as fast. Looking At The Source Code -------------------------- Looking at git diff of `REL_15_6` against `REL_16_0`, it seems the `roles_is_member_of` function called by the new in PostgreSQL 16 `check_role_membership_authorization` is expensive for our use case. REL_16_0: src/backend/commands/user.c:1562 ---8<------ (errcode(ERRCODE_INVALID_GRANT_OPERATION), errmsg("column names cannot be included in GRANT/REVOKE ROLE"))); roleid = get_role_oid(rolename, false); check_role_membership_authorization(currentUserId, roleid, stmt->is_grant); if (stmt->is_grant) --->8------ While I can see the value in improvements on how ROLEs are being handled PostgreSQL 16 onward, I'm curious what would help for setups that has thousands of ROLEs like us outside of patching the source code?