Hi,
I created an index on a 11devel base while sampling pg_stat_activity with a little tool. Tool catches a line if state = active. Collected rows are aggregated and sorted by activity percentage.
Test environment :
select version();
version
----------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 11devel (Debian 11~~devel~20180227.2330-1~420.git51057fe.pgdg+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 7.3.0-5) 7.3.0, 64-bit
(1 ligne)
Temps : 0,762 ms
create table t1(c1 bigint, c2 double precision, c3 text);
CREATE TABLE
insert into t1 select generate_series(1,100000000,1), random(), md5(random()::text) ;
INSERT 0 100000000
With a select (select max(c1) from t1 group by c2;) I have this kind of output :
./t -d 20 -o "pid, backend_type, query, wait_event_type, wait_event"
traqueur 2.04.00 - performance tool for PostgreSQL 9.3 => 11
INFORMATION, no connection parameters provided, connecting to traqueur database ...
INFORMATION, connected to traqueur database
INFORMATION, PostgreSQL version : 110000
INFORMATION, sql preparation ...
INFORMATION, sql execution ...
busy_pc | distinct_exe | pid | backend_type | query | wait_event_type | wait_event
---------+--------------+-------+-----------------+-------------------------------------+-----------------+--------------
76 | 1 / 151 | 10065 | parallel worker | select max(c1) from t1 group by c2; | IO | DataFileRead
73 | 1 / 146 | 8262 | client backend | select max(c1) from t1 group by c2; | IO | DataFileRead
72 | 1 / 144 | 10066 | parallel worker | select max(c1) from t1 group by c2; | IO | DataFileRead
26 | 1 / 53 | 10066 | parallel worker | select max(c1) from t1 group by c2; | |
26 | 1 / 51 | 8262 | client backend | select max(c1) from t1 group by c2; | |
24 | 1 / 47 | 10065 | parallel worker | select max(c1) from t1 group by c2; | |
2 | 1 / 3 | 10066 | parallel worker | select max(c1) from t1 group by c2; | IO | BufFileWrite
2 | 1 / 3 | 8262 | client backend | select max(c1) from t1 group by c2; | IO | BufFileWrite
1 | 1 / 2 | 10065 | parallel worker | select max(c1) from t1 group by c2; | IO | BufFileWrite
With an index creation (create index t1_i1 on t1(c1, c2);) I have this kind of output :
./t -d 20 -o "pid, backend_type, query, wait_event_type, wait_event"
traqueur 2.04.00 - performance tool for PostgreSQL 9.3 => 11
INFORMATION, no connection parameters provided, connecting to traqueur database ...
INFORMATION, connected to traqueur database
INFORMATION, PostgreSQL version : 110000
INFORMATION, sql preparation ...
INFORMATION, sql execution ...
busy_pc | distinct_exe | pid | backend_type | query | wait_event_type | wait_event
---------+--------------+------+----------------+-----------------------------------+-----------------+--------------
68 | 1 / 136 | 8262 | client backend | create index t1_i1 on t1(c1, c2); | IO | DataFileRead
26 | 1 / 53 | 8262 | client backend | create index t1_i1 on t1(c1, c2); | |
6 | 1 / 11 | 8262 | client backend | create index t1_i1 on t1(c1, c2); | IO | BufFileWrite
(3 rows)
No parallel worker. At least one parallel worker was active though, I could see its work with a direct query on pg_stat_activity or a ps -ef :
...
postgres 8262 8230 7 08:54 ? 00:22:46 postgres: 11/main: postgres postgres [local] CREATE INDEX
...
postgres 9833 8230 23 14:17 ? 00:00:33 postgres: 11/main: parallel worker for PID 8262
...
Tool only catches activity of the client backend cause column state of pg_stat_activity is null for the parallel workers in this case. I added an option to do a "(state = 'active' or wait_event_is not null)" It's not 100% accurate though : I miss the activity
of the parallel workers which is not waiting and it’s more difficult to know who helps whom since query is also null.
I can imagine various workarounds but 11 is in devel and maybe columns active & query of pg_stat_activity will be filled for the parallel workers even for an index creation ?
Best regards
Phil