Hi, We've encountered failures of "make check", when we put PostgreSQL data directory on a NFS filesystem or a tmpfs filesystem. It doesn't always fail, but fails occasionally. Is this expected behavior of PostgreSQL? If it's expected, what is the reason of this symptom? I grep'ed the source code of PostgreSQL, but it seems it doesn't use problematic operations (for NFS) like flock(2) or F_SETLK/F_SETLKW of fcntl(2)... So, I guess (theoretically) it should work fine over NFS or tmpfs. Only idea which strucks me is there is some nasty bug in Linux. ;-) Of course, we are using single instance of PostgreSQL on single machine. i.e. We are NOT accessing the data directory from either multiple machines or multiple PostgreSQL instances. To give an actual example, when we invoked the following shell script: $ cat ~/regress-loop.sh #!/bin/sh loop=1 make clean while true; do echo "############### loop = $loop ##################" make check ret=$? if [ $ret -ne 0 ]; then echo "error @ loop = $loop (return value = $ret)" exit $ret fi loop=`expr $loop + 1` done Errors like the following happen, sometimes: $ sh ~/regress-loop.sh : : make: *** [check] Error 2 error @ loop = 26 (return value = 2) We observed this symptom under the following conditions: 1. putting PGDATA on NFS-async filesystem: NFS (async) NFS client: PostgreSQL version: 8.1.3 OS version: Fedora Core 3 Linux NFS server: OS version: Fedora Core 3 Linux "async" is specified in /etc/exports, thus the server violates the NFS protocol, and replys to requests before it stores changes to its disk. How many loops until it fails: 3000 loops or more 2. putting PGDATA on NFS filesystem: NFS NFS client: PostgreSQL version: 8.1.3 OS version: Fedora Core 4 Linux NFS server: OS version: Fedora Core 5 Linux How many loops until it fails: approximately 300 loops 3. putting PGDATA on tmpfs filesystem: tmpfs PostgreSQL version: 8.1.3 OS version: Fedora Core 5 Linux How many loops until it fails: approximately 100 loops This symptom never happens over ext3fs, as far as we see. I attached the diff between expected results and actual results in this mail. Any ideas appreciated, except using local filesystem. ;-) -- SODA Noriyuki
*** ./expected/tablespace.out Tue May 16 13:03:24 2006 --- ./results/tablespace.out Fri May 19 21:04:30 2006 *************** *** 35,37 **** --- 35,38 ---- NOTICE: drop cascades to table testschema.foo -- Should succeed DROP TABLESPACE testspace; + ERROR: tablespace "testspace" is not empty ======================================================================
*** ./expected/tablespace.out Fri May 19 15:28:32 2006 --- ./results/tablespace.out Sat May 20 06:13:18 2006 *************** *** 35,37 **** --- 35,38 ---- NOTICE: drop cascades to table testschema.foo -- Should succeed DROP TABLESPACE testspace; + ERROR: tablespace "testspace" is not empty ======================================================================
*** ./expected/sanity_check.out Fri Sep 9 05:07:42 2005 --- ./results/sanity_check.out Fri May 19 16:31:37 2006 *************** *** 17,22 **** --- 17,24 ---- circle_tbl | t fast_emp4000 | t func_index_heap | t + gcircle_tbl | t + gpolygon_tbl | t hash_f8_heap | t hash_i4_heap | t hash_name_heap | t *************** *** 68,74 **** shighway | t tenk1 | t tenk2 | t ! (58 rows) -- -- another sanity check: every system catalog that has OIDs should have --- 70,76 ---- shighway | t tenk1 | t tenk2 | t ! (60 rows) -- -- another sanity check: every system catalog that has OIDs should have ======================================================================