Hi,
I am running into a strange issue with Postgres 10 when using pg_dump with the directory format and jobs parameter set it intermittently hangs. Seems to occur less frequently the lower I set the jobs parameter but does happen eventually even when set to 2 (could not reproduce when jobs=1). I've tested with Postgres 11 and no matter how high I set the jobs parameter it never hangs (tested all the way up to jobs=8). Also with Postgres 10 and other formats I don't run into the issue. Here are my server specs and software versions (OS and Postgres)
Windows Server 2016 DataCenter 10.0.14393
2 cores 4 logical processors
26GB memory
Postgres 11.4 (port 5433)
Postgres 10.9 (port 5432)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
And here are the psql statements and powershell script to recreate the issue.
CREATE DATABASE pg_dump_hang;
\c pg_dump_hang
CREATE TABLE series1 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series2 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series3 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series4 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series5 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series6 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series7 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series8 AS SELECT i FROM generate_series(1,1000000) i;
$env:PGPASSWORD='#######'
$env:PGUSER='postgres'
$env:PGPORT=5433
set-location "C:\program files\postgresql\11\bin"
$a = 1
DO
{
remove-item -Recurse D:\dumphangtest
.\pg_dump.exe --format=directory --jobs=4 --file=D:\dumphangtest pg_dump_hang
$a++
"Completed loop $a at -" + (date)
} While ($a -le 25)
$env:PGPASSWORD='#######'
$env:PGUSER='postgres'
$env:PGPORT=5432
set-location "c:\program files\postgresql\10\bin"
$a = 1
DO
{
remove-item -Recurse D:\dumphangtest
.\pg_dump.exe --format=directory --jobs=4 --file=D:\dumphangtest pg_dump_hang
$a++
"Completed loop $a at -" + (date)
} While ($a -le 25)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Here is the output from pg_stat_activity and the filesystem listing when the hang occurs.
--jobs=4
postgres=# select pid,state,state_change,query_start,wait_event,wait_event_type,substring(query,1,35) from pg_stat_activity where datname like 'pg_dump%';
pid | state | state_change | query_start | wait_event | wait_event_type | substring
------+---------------------+-------------------------------+-------------------------------+-------------+-----------------+-------------------------------------
5324 | idle in transaction | 2019-06-27 15:29:13.03416+01 | 2019-06-27 15:29:13.03154+01 | ClientRead | Client | SELECT at.attname, (SELECT pg_catal
4144 | idle in transaction | 2019-06-27 15:29:16.222315+01 | 2019-06-27 15:29:15.070106+01 | ClientRead | Client | COPY public.series6 (i) TO stdout;
5576 | active | 2019-06-27 15:29:13.642069+01 | 2019-06-27 15:29:13.642068+01 | ClientWrite | Client | COPY public.series2 (i) TO stdout;
3860 | idle in transaction | 2019-06-27 15:29:17.109549+01 | 2019-06-27 15:29:16.198842+01 | ClientRead | Client | COPY public.series8 (i) TO stdout;
3120 | idle in transaction | 2019-06-27 15:29:16.350121+01 | 2019-06-27 15:29:15.089817+01 | ClientRead | Client | COPY public.series7 (i) TO stdout;
(5 rows)
D:\>dir dumphangtest
Volume in drive D has no label.
Volume Serial Number is C635-CDEF
Directory of D:\dumphangtest
06/27/2019 10:29 AM <DIR> .
06/27/2019 10:29 AM <DIR> ..
06/27/2019 10:29 AM 2,114,899 2863.dat.gz
06/27/2019 10:29 AM 376,832 2864.dat.gz
06/27/2019 10:29 AM 2,114,899 2865.dat.gz
06/27/2019 10:29 AM 2,114,899 2866.dat.gz
06/27/2019 10:29 AM 2,114,899 2867.dat.gz
06/27/2019 10:29 AM 2,114,899 2868.dat.gz
06/27/2019 10:29 AM 2,114,899 2869.dat.gz
06/27/2019 10:29 AM 2,114,899 2870.dat.gz
06/27/2019 10:29 AM 8,452 toc.dat
9 File(s) 15,189,577 bytes
2 Dir(s) 102,021,439,488 bytes free
--jobs=2
postgres=# select pid,state,state_change,query_start,wait_event,wait_event_type,substring(query,1,35) from pg_stat_activity where datname like 'pg_dump%';
pid | state | state_change | query_start | wait_event | wait_event_type | substring
------+---------------------+-------------------------------+-------------------------------+-------------+-----------------+-------------------------------------
3420 | idle in transaction | 2019-06-27 14:37:55.067559+01 | 2019-06-27 14:37:54.448198+01 | ClientRead | Client | COPY public.series8 (i) TO stdout;
3180 | idle in transaction | 2019-06-27 14:37:49.537396+01 | 2019-06-27 14:37:49.53608+01 | ClientRead | Client | SELECT at.attname, (SELECT pg_catal
5364 | active | 2019-06-27 14:37:50.151002+01 | 2019-06-27 14:37:50.151001+01 | ClientWrite | Client | COPY public.series2 (i) TO stdout;
(3 rows)
D:\>dir dumphangtest
Volume in drive D has no label.
Volume Serial Number is C635-CDEF
Directory of D:\dumphangtest
06/27/2019 09:37 AM <DIR> .
06/27/2019 09:37 AM <DIR> ..
06/27/2019 09:37 AM 2,114,899 2863.dat.gz
06/27/2019 09:37 AM 81,920 2864.dat.gz
06/27/2019 09:37 AM 2,114,899 2865.dat.gz
06/27/2019 09:37 AM 2,114,899 2866.dat.gz
06/27/2019 09:37 AM 2,114,899 2867.dat.gz
06/27/2019 09:37 AM 2,114,899 2868.dat.gz
06/27/2019 09:37 AM 2,114,899 2869.dat.gz
06/27/2019 09:37 AM 2,114,899 2870.dat.gz
06/27/2019 09:37 AM 8,452 toc.dat
9 File(s) 14,894,665 bytes
2 Dir(s) 102,056,497,152 bytes free
----------------------------------------------------------------------------------------------------------
I am running into a strange issue with Postgres 10 when using pg_dump with the directory format and jobs parameter set it intermittently hangs. Seems to occur less frequently the lower I set the jobs parameter but does happen eventually even when set to 2 (could not reproduce when jobs=1). I've tested with Postgres 11 and no matter how high I set the jobs parameter it never hangs (tested all the way up to jobs=8). Also with Postgres 10 and other formats I don't run into the issue. Here are my server specs and software versions (OS and Postgres)
Windows Server 2016 DataCenter 10.0.14393
2 cores 4 logical processors
26GB memory
Postgres 11.4 (port 5433)
Postgres 10.9 (port 5432)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
And here are the psql statements and powershell script to recreate the issue.
CREATE DATABASE pg_dump_hang;
\c pg_dump_hang
CREATE TABLE series1 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series2 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series3 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series4 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series5 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series6 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series7 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series8 AS SELECT i FROM generate_series(1,1000000) i;
$env:PGPASSWORD='#######'
$env:PGUSER='postgres'
$env:PGPORT=5433
set-location "C:\program files\postgresql\11\bin"
$a = 1
DO
{
remove-item -Recurse D:\dumphangtest
.\pg_dump.exe --format=directory --jobs=4 --file=D:\dumphangtest pg_dump_hang
$a++
"Completed loop $a at -" + (date)
} While ($a -le 25)
$env:PGPASSWORD='#######'
$env:PGUSER='postgres'
$env:PGPORT=5432
set-location "c:\program files\postgresql\10\bin"
$a = 1
DO
{
remove-item -Recurse D:\dumphangtest
.\pg_dump.exe --format=directory --jobs=4 --file=D:\dumphangtest pg_dump_hang
$a++
"Completed loop $a at -" + (date)
} While ($a -le 25)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Here is the output from pg_stat_activity and the filesystem listing when the hang occurs.
--jobs=4
postgres=# select pid,state,state_change,query_start,wait_event,wait_event_type,substring(query,1,35) from pg_stat_activity where datname like 'pg_dump%';
pid | state | state_change | query_start | wait_event | wait_event_type | substring
------+---------------------+-------------------------------+-------------------------------+-------------+-----------------+-------------------------------------
5324 | idle in transaction | 2019-06-27 15:29:13.03416+01 | 2019-06-27 15:29:13.03154+01 | ClientRead | Client | SELECT at.attname, (SELECT pg_catal
4144 | idle in transaction | 2019-06-27 15:29:16.222315+01 | 2019-06-27 15:29:15.070106+01 | ClientRead | Client | COPY public.series6 (i) TO stdout;
5576 | active | 2019-06-27 15:29:13.642069+01 | 2019-06-27 15:29:13.642068+01 | ClientWrite | Client | COPY public.series2 (i) TO stdout;
3860 | idle in transaction | 2019-06-27 15:29:17.109549+01 | 2019-06-27 15:29:16.198842+01 | ClientRead | Client | COPY public.series8 (i) TO stdout;
3120 | idle in transaction | 2019-06-27 15:29:16.350121+01 | 2019-06-27 15:29:15.089817+01 | ClientRead | Client | COPY public.series7 (i) TO stdout;
(5 rows)
D:\>dir dumphangtest
Volume in drive D has no label.
Volume Serial Number is C635-CDEF
Directory of D:\dumphangtest
06/27/2019 10:29 AM <DIR> .
06/27/2019 10:29 AM <DIR> ..
06/27/2019 10:29 AM 2,114,899 2863.dat.gz
06/27/2019 10:29 AM 376,832 2864.dat.gz
06/27/2019 10:29 AM 2,114,899 2865.dat.gz
06/27/2019 10:29 AM 2,114,899 2866.dat.gz
06/27/2019 10:29 AM 2,114,899 2867.dat.gz
06/27/2019 10:29 AM 2,114,899 2868.dat.gz
06/27/2019 10:29 AM 2,114,899 2869.dat.gz
06/27/2019 10:29 AM 2,114,899 2870.dat.gz
06/27/2019 10:29 AM 8,452 toc.dat
9 File(s) 15,189,577 bytes
2 Dir(s) 102,021,439,488 bytes free
--jobs=2
postgres=# select pid,state,state_change,query_start,wait_event,wait_event_type,substring(query,1,35) from pg_stat_activity where datname like 'pg_dump%';
pid | state | state_change | query_start | wait_event | wait_event_type | substring
------+---------------------+-------------------------------+-------------------------------+-------------+-----------------+-------------------------------------
3420 | idle in transaction | 2019-06-27 14:37:55.067559+01 | 2019-06-27 14:37:54.448198+01 | ClientRead | Client | COPY public.series8 (i) TO stdout;
3180 | idle in transaction | 2019-06-27 14:37:49.537396+01 | 2019-06-27 14:37:49.53608+01 | ClientRead | Client | SELECT at.attname, (SELECT pg_catal
5364 | active | 2019-06-27 14:37:50.151002+01 | 2019-06-27 14:37:50.151001+01 | ClientWrite | Client | COPY public.series2 (i) TO stdout;
(3 rows)
D:\>dir dumphangtest
Volume in drive D has no label.
Volume Serial Number is C635-CDEF
Directory of D:\dumphangtest
06/27/2019 09:37 AM <DIR> .
06/27/2019 09:37 AM <DIR> ..
06/27/2019 09:37 AM 2,114,899 2863.dat.gz
06/27/2019 09:37 AM 81,920 2864.dat.gz
06/27/2019 09:37 AM 2,114,899 2865.dat.gz
06/27/2019 09:37 AM 2,114,899 2866.dat.gz
06/27/2019 09:37 AM 2,114,899 2867.dat.gz
06/27/2019 09:37 AM 2,114,899 2868.dat.gz
06/27/2019 09:37 AM 2,114,899 2869.dat.gz
06/27/2019 09:37 AM 2,114,899 2870.dat.gz
06/27/2019 09:37 AM 8,452 toc.dat
9 File(s) 14,894,665 bytes
2 Dir(s) 102,056,497,152 bytes free
----------------------------------------------------------------------------------------------------------
And here is the stack trace from the thread of the hung process.
ntoskrnl.exe!KiSwapContext+0x76
ntoskrnl.exe!KiSwapThread+0x17d
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForSingleObject+0x377
ntoskrnl.exe!KiSchedulerApc+0x231
ntoskrnl.exe!KiDeliverApc+0x22a
ntoskrnl.exe!KiSwapThread+0x364
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForMultipleObjects+0x1fe
ntoskrnl.exe!ObWaitForMultipleObjects+0x2c7
ntoskrnl.exe!NtWaitForMultipleObjects+0xf9
ntoskrnl.exe!KiSystemServiceCopyEnd+0x13
ntdll.dll!NtWaitForMultipleObjects+0x14
KERNELBASE.dll!WaitForMultipleObjectsEx+0xef
KERNELBASE.dll!WaitForMultipleObjects+0xe
postgres.exe!WaitLatchOrSocket+0x456
postgres.exe!WaitEventSetWait+0x8e
postgres.exe!secure_write+0xeb
postgres.exe!pq_putbytes+0x4e5
postgres.exe!pq_putbytes+0x41b
postgres.exe!pq_putbytes+0x256
postgres.exe!CreateCopyDestReceiver+0x3784
postgres.exe!CreateCopyDestReceiver+0x17b2
postgres.exe!CreateCopyDestReceiver+0x144c
postgres.exe!CreateCopyDestReceiver+0x1098
postgres.exe!DoCopy+0x50a
postgres.exe!standard_ProcessUtility+0x37b
pg_stat_statements.dll!pg_finfo_pg_stat_statements+0xde1
postgres.exe!ProcessUtility+0x34
postgres.exe!PortalRunFetch+0x891
postgres.exe!PortalRunFetch+0xbae
postgres.exe!PortalRun+0x1a0
postgres.exe!get_stats_option_name+0x1335
postgres.exe!PostgresMain+0x635
postgres.exe!ShmemBackendArrayAllocation+0x2b3a
postgres.exe!SubPostmasterMain+0x275
postgres.exe!main+0x480
postgres.exe!pgwin32_popen+0x144f
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21
Any ideas on changes in Postgres 10 that would cause this? We were previously running 9.6.7 and didn't encounter this issue on that version either.
Thanks
Ben.
ntoskrnl.exe!KiSwapThread+0x17d
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForSingleObject+0x377
ntoskrnl.exe!KiSchedulerApc+0x231
ntoskrnl.exe!KiDeliverApc+0x22a
ntoskrnl.exe!KiSwapThread+0x364
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForMultipleObjects+0x1fe
ntoskrnl.exe!ObWaitForMultipleObjects+0x2c7
ntoskrnl.exe!NtWaitForMultipleObjects+0xf9
ntoskrnl.exe!KiSystemServiceCopyEnd+0x13
ntdll.dll!NtWaitForMultipleObjects+0x14
KERNELBASE.dll!WaitForMultipleObjectsEx+0xef
KERNELBASE.dll!WaitForMultipleObjects+0xe
postgres.exe!WaitLatchOrSocket+0x456
postgres.exe!WaitEventSetWait+0x8e
postgres.exe!secure_write+0xeb
postgres.exe!pq_putbytes+0x4e5
postgres.exe!pq_putbytes+0x41b
postgres.exe!pq_putbytes+0x256
postgres.exe!CreateCopyDestReceiver+0x3784
postgres.exe!CreateCopyDestReceiver+0x17b2
postgres.exe!CreateCopyDestReceiver+0x144c
postgres.exe!CreateCopyDestReceiver+0x1098
postgres.exe!DoCopy+0x50a
postgres.exe!standard_ProcessUtility+0x37b
pg_stat_statements.dll!pg_finfo_pg_stat_statements+0xde1
postgres.exe!ProcessUtility+0x34
postgres.exe!PortalRunFetch+0x891
postgres.exe!PortalRunFetch+0xbae
postgres.exe!PortalRun+0x1a0
postgres.exe!get_stats_option_name+0x1335
postgres.exe!PostgresMain+0x635
postgres.exe!ShmemBackendArrayAllocation+0x2b3a
postgres.exe!SubPostmasterMain+0x275
postgres.exe!main+0x480
postgres.exe!pgwin32_popen+0x144f
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21
Any ideas on changes in Postgres 10 that would cause this? We were previously running 9.6.7 and didn't encounter this issue on that version either.
Thanks
Ben.
Ben Snaidero | |
Geotab | |
Senior Database Specialist | |
Direct | +1 (289) 230-7749 |
Toll-free | +1 (877) 436-8221 |
Visit | www.geotab.com |
Twitter | Facebook | YouTube | LinkedIn |