Merlin Moncure wrote: > On 10/28/06, Simon Riggs <simon@xxxxxxxxxxxxxxx> wrote: >> On Thu, 2006-10-26 at 11:06 -0400, Merlin Moncure wrote: >> > On 10/26/06, Carlo Stonebanks <stonec.register@xxxxxxxxxxxx> wrote: >> > > This is pretty interesting - where can I read more on this? >> Windows isn't >> > > actually hanging, one single command line window is - from its >> behaviour, it >> > > looks like the TCL postgresql package is waiting for pg_exec to >> come back >> > > from the commit (I believe the commit has actually gone through). >> > > >> > > It could even be that there's something wrong with the TCL >> package, but from >> > > my understanding it is one of the most complete interfaces out >> there - which >> > > is weird, because TCL seems to be the most unpopular language in the >> > > community. >> > >> > when it happens, make sure to query pg_locks and see what is going on >> > there lock issues are not supposed to manifest on a commit, which >> > releases locks, but you never know. There have been reports of >> > insonsistent lock ups on windows (espeically multi-processor) which >> > you might be experiencing. Make sure you have the very latest version >> > of pg 8.1.x. Also consider checking out 8.2 and see if you can >> > reproduce the behavior there...this will require compiling postgresql. >> >> Merlin, >> >> Rumour has it you managed to get a BT from Windows. That sounds like it >> would be very useful here. Could it be there is a hangup in communication with the backend via the libpq library? I have a situation on Windows where psql seems to be hanging randomly AFTER completing (or almost completing) a vacuum full analyze verbose. I'm running the same databases on a single postgres instance on a Dell 4gb RAM 2 processor xeon (hyper-threading turned off) running Debian GNU/Linux. The windows system is an IBM 24gb RAM, 4 processor xeon (hyperthreading turned off). No problems on the Dell, it runs pgbench faster than the windows IBM system. The Dell Linux system zips through vacuumdb --all --analyze --full --verbose with no problems. The windows machine is running 6 instances of postgresql because of problems trying to load all of the databases into one instance on windows. The last output from psql is: INFO: free space map contains 474 pages in 163 relations DETAIL: A total of 2864 page slots are in use (including overhead). 2864 page slots are required to track all free space. Current limits are: 420000 page slots, 25000 relations, using 4154 KB. (I've currently restarted postgresql with more reasonable fsm_page_slots and fsm_relations). It appears that psql is hung in the call to WS2_32!select. The psql stack trace looks like this: ntdll!KiFastSystemCallRet ntdll!NtWaitForSingleObject+0xc mswsock!SockWaitForSingleObject+0x19d mswsock!WSPSelect+0x380 WS2_32!select+0xb9 WARNING: Stack unwind information not available. Following frames may be wrong. libpq!PQenv2encoding+0x1fb libpq!PQenv2encoding+0x3a1 libpq!PQenv2encoding+0x408 libpq!PQgetResult+0x58 libpq!PQgetResult+0x188 psql+0x4c0f psql+0x954d psql+0x11e7 psql+0x1238 kernel32!IsProcessorFeaturePresent+0x9e With more detail: # ChildEBP RetAddr Args to Child 00 0022f768 7c822124 71b23a09 000007a8 00000001 ntdll!KiFastSystemCallRet (FPO: [0,0,0]) 01 0022f76c 71b23a09 000007a8 00000001 0022f794 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0]) 02 0022f7a8 71b23a52 000007a8 00000780 00000000 mswsock!SockWaitForSingleObject+0x19d (FPO: [Non-Fpo]) 03 0022f898 71c0470c 00000781 0022fc40 0022fb30 mswsock!WSPSelect+0x380 (FPO: [Non-Fpo]) 04 0022f8e8 6310830b 00000781 0022fc40 0022fb30 WS2_32!select+0xb9 (FPO: [Non-Fpo]) WARNING: Stack unwind information not available. Following frames may be wrong. 05 0022fd68 631084b1 00000000 ffffffff 0000001d libpq!PQenv2encoding+0x1fb 06 0022fd88 63108518 00000001 00000000 00614e70 libpq!PQenv2encoding+0x3a1 07 0022fda8 631060f8 00000001 00000000 00614e70 libpq!PQenv2encoding+0x408 08 0022fdc8 63106228 00614e70 00613a71 00615188 libpq!PQgetResult+0x58 09 0022fde8 00404c0f 00614e70 00613a71 0041ac7a libpq!PQgetResult+0x188 0a 0022fe98 0040954d 00613a71 00423180 00423185 psql+0x4c0f 0b 0022ff78 004011e7 00000006 00613b08 00612aa8 psql+0x954d 0c 0022ffb0 00401238 00000001 00000009 0022fff0 psql+0x11e7 0d 0022ffc0 77e523e5 00000000 00000000 7ffdc000 psql+0x1238 0e 0022fff0 00000000 00401220 00000000 78746341 kernel32!IsProcessorFeaturePresent+0x9e the pg_locks table: -[ RECORD 1 ]-+---------------- locktype | relation database | 19553 relation | 10342 page | tuple | transactionid | classid | objid | objsubid | transaction | 1998424 pid | 576 mode | AccessShareLock granted | t -[ RECORD 2 ]-+---------------- locktype | transactionid database | relation | page | tuple | transactionid | 1998424 classid | objid | objsubid | transaction | 1998424 pid | 576 mode | ExclusiveLock granted | t The call stack on the postgres.exe process id 576: ntdll!KiFastSystemCallRet ntdll!NtWaitForMultipleObjects+0xc WARNING: Stack unwind information not available. Following frames may be wrong. kernel32!ResetEvent+0x45 postgres!pgwin32_waitforsinglesocket+0x89 postgres!pgwin32_recv+0x82 postgres!secure_read+0x7b postgres!TouchSocketFile+0x93 postgres!pq_getbyte+0x22 postgres!PostgresMain+0x1056 postgres!SubPostmasterMain+0x9ca postgres!main+0x33f postgres+0x11e7 postgres+0x1238 kernel32!IsProcessorFeaturePresent+0x9e These are the parameters: listen_addresses = '*' port = 5432 max_connections = 300 shared_buffers = 30000 temp_buffers = 5000 work_mem = 4096 max_fsm_pages = 25000 max_fsm_relations = 500 vacuum_cost_delay = 50 wal_buffers = 32 checkpoint_segments = 16 effective_cache_size = 50000 random_page_cost = 3 default_statistics_target = 300 log_destination = 'stderr' redirect_stderr = on (Since there's 24gb RAM on this thing, I've apparently gotten the system cache up to about 6 gb, according sysinternals.com "System Information" applet.) "Cache Data Map Hits %" runs at 100% Physical Disk Queue Lengths are almost non-existent Processors are not very busy at all Seems like once something is in memory, we never have to go back to the disk again except to write. (hope so with that kind of RAM). >> >> Carlo, >> >> Many things can happen at commit time. Temp tables dropped, TRUNCATEd >> old relations unlinked, init files removed, deferred foreign key checks >> (and subsequent cascading), dropped tables flushed. The assumption that >> COMMIT is a short request may not be correct according to the wide range >> of tasks that could occur according to standard SQL:2003 behaviour. >> >> Some of those effects take longer on larger systems. Any and all of >> those things have potential secondary effects, all of which can also >> conflict with other user tasks and especially with a CHECKPOINT. Then >> there's various forms of contention caused by misconfiguration. >> >> I do think we need some better instrumentation for this kind of thing. >> >> -- >> Simon Riggs >> EnterpriseDB http://www.enterprisedb.com > > start here: > http://beta.linuxports.com/pgsql-hackers-win32/2005-08/msg00051.php > > merlin > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > >