Hello Postgresql Community Members,
I am stumped trying to install a few 'c' language functions
on a particular Solaris server (64-bit, amd cpu arch (not sparc)). I actually
have 5 Postgresql servers, and the .so loads fine into 4 of them, but
refuses to load into the 5th. I've quintuple checked the file
permissions, build of the .so, gcc versions, PostgreSQL versions,
etc... I've had a college double check my work. We're both stumped.
Details to follow.
All servers are running Solaris 10u9 on 64-bit hardware inside
Solaris zones. Two of the servers are X4720's, 144GB ram, 24 Intel
CPU cores. These two servers run the 4 working Solaris zones that
are able to load the function implemented in the .so files. Postgresql
version 8.4.6, compiled from source (not a binary package).
The server that is misbehaving is an X4600, 128 GB ram, 16 AMD CPU
cores, but otherwise identical: Solaris 10u9, 64-bit OS, Postgresql
8.4.6. All 5 systems use the stock gcc that ships with Solaris (v3.4.3,
its old, I know).
The permissions on the files and Postgresql directories. First the
a working server, then the server that is not working as expected.
(root@working: </db>) # ls -ld /db /db/*.so
drwx------ 11 pgsql root 23 Sep 27 10:39 /db
-rwxr-xr-x 1 root root 57440 Sep 27 10:39 /db/pgsql_micr_parser_64.so
(root@working: </db>) # psql -Upgsql -dpostgres -c"select version();"
PostgreSQL 8.4.6 on x86_64-pc-solaris2.11, compiled by GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-20050802), 64-bit
(root@working: </db>) # file /opt/local/x64/postgresql-8.4.6/bin/postgres
/opt/local/x64/postgresql-8.4.6/bin/postgres: ELF 64-bit LSB executable AMD64 Version 1 [SSE], dynamically linked, not stripped
(root@working: </db>) # psql -Upgsql -dmy_db -c"create or replace function parse_micr(text) returns micr_struct
as '/db/pgsql_micr_parser_64.so', 'pgsql_micr_parser' language c volatile cost 1;"
CREATE FUNCTION
(root@working: </db>) # psql -Upgsql -dmy_db -t -c"select transit from parse_micr(':8888=8888: <45800=100<');"
8888=8888
(root@failed: </db>) # ls -ld /db /db/*.so
drwx------ 11 pgsql root 24 Sep 29 11:16 /db
-rwxr-xr-x 1 root root 57440 Sep 29 09:46 /db/pgsql_micr_parser_64.so
(root@failed: </db>) # psql -Upgsql -dpostgres -c"select version();"
PostgreSQL 8.4.6 on x86_64-pc-solaris2.11, compiled by GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-20050802), 64-bit
(root@failed: </db>) # file /opt/local/x64/postgresql-8.4.6/bin/postgres
/opt/local/x64/postgresql-8.4.6/bin/postgres: ELF 64-bit LSB executable AMD64 Version 1 [SSE], dynamically linked, not stripped
(root@failed: </db>) # psql -Upgsql -dmy_db -c"create or replace function parse_micr(text) returns micr_struct
as '/db/pgsql_micr_parser_64.so', 'pgsql_micr_parser' language c volatile cost 1;"
ERROR: could not load library "/db/pgsql_micr_parser_64.so": ld.so.1: postgres: fatal: /db/pgsql_micr_parser_64.so: Permission denied
Ok. Well, the file permissions are correct, so what gives? Next
step is to trace the backend process as it attempts to load the .so.
So I connect to the "failed" server via pgAdmin and run "select getpid();"
I then run "truss -p <PID>" from my shell, and in pgAdmin, execute the
SQL to create the function. This is the result of the system trace:
(root@failed: </db>) # truss -p 10369
recv(9, 0x0097C103, 5, 0) (sleeping...)
recv(9, "170301\0 ", 5, 0) = 5
recv(9, " TBEE5 n J\0 VF6E4DDCF84".., 32, 0) = 32
recv(9, "170301\0B0", 5, 0) = 5
recv(9, "AAD5A5 L97B0CEA5A9F0CD89".., 176, 0) = 176
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF9520) = 0
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF9530) = 0
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF8F50) = 0
resolvepath("/db/pgsql_micr_parser_64.so", "/db/pgsql_micr_parser_64.so", 1023) = 27
open("/db/pgsql_micr_parser_64.so", O_RDONLY) = 22
mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 22, 0) Err#13 EACCES
close(22) = 0
setcontext(0xFFFFFD7FFFDF9050)
setcontext(0xFFFFFD7FFFDF9BB0)
We can see that the backend is able to open the .so file for
reading, but the mmap fails. From the Solaris man page on mmap:
ERRORS
The mmap() function will fail if:
EACCES The fildes file descriptor is not open for
read, regardless of the protection speci-
fied; or fildes is not open for write and
PROT_WRITE was specified for a MAP_SHARED
type mapping.
My analysis:
1) The file descriptor (#22) is open for O_RDONLY.
2) PROT_WRITE and MAP_SHARED are not specified, so write access is not relevant.
Things that I tried, unsuccessfully:
1) I recompiled the .so on the target system (X4600, AMD chips) just
in case it is somehow different from the .so that got built on the
working system (X4270, Intel chips).
2) Tested with a different .so (I have another that implements forward
and reverse DNS lookups, so one may invoke DNS functions inside SQL
statements). Same behavior. Loads fine on the X4270 systems, but
fails on the X4600 system.
3) Compiled both .so's on 32-bit and 64-bit Gentoo Linux and load them
into Postgresql 9.0.4. Works fine.
4) Compiled both .so's on 64-bit Solaris 10u9, postgresql 9.1 on an
X4270 and it loads fine there too.
5) Examined a truss on a working system while loading the function.
Since it loaded fine already, I had to drop the function, then
disconnect pgAdmin (to make the backend exit), reconnect and redo
the "create function":
(root@working: </db>) # truss -p 16921
## (I elided a bunch of non-relevant grovelling though the FSM mapped file)
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF9520) = 0
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF9530) = 0
stat("/db/pgsql_micr_parser_64.so", 0xFFFFFD7FFFDF8F50) = 0
resolvepath("/db/pgsql_micr_parser_64.so", "/db/pgsql_micr_parser_64.so", 1023) = 27
open("/db/pgsql_micr_parser_64.so", O_RDONLY) = 22
mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 22, 0) = 0xFFFFFD7FFED80000
mmap(0x00010000, 90112, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, 4294967295, 0) = 0xFFFFFD7FFED00000
mmap(0xFFFFFD7FFED00000, 21997, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 22, 0) = 0xFFFFFD7FFED00000
mmap(0xFFFFFD7FFED15000, 2576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 22, 20480) = 0xFFFFFD7FFED15000
munmap(0xFFFFFD7FFED06000, 61440) = 0
memcntl(0xFFFFFD7FFED00000, 7008, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(22) = 0
6) There is nothing interesting in dmesg or syslog.
7) Disconnecting and reconnecting a few times, to try a freshly
launched backend. No luck.
Any thoughts or suggestions?