Re: Fresh build on OS X not working (memory)

Tom Lane <tgl@xxxxxxxxxxxxx> · Fri, 30 Apr 2010 17:35:34 -0400

I wrote:
> I have no idea why EDB recommend changing the maxproc settings, but
> I doubt that's related to shared memory.  I see that they have shmmax
> equal to exactly 4096 times shmall, so that's good, but there must be
> some other OSX peculiarity that this is tripping over.  Maybe it's too
> large ... do you actually have 1.6GB of RAM available that could
> reasonably be dedicated to PG shared memory?  Another possibility is
> that maybe only exact powers of 2 work well.  I'm just guessing though.

I poked around in the OS X kernel sources (let's hear it for open
source) and found that there really isn't a lot of special magic around
the shm control parameters anymore: as of OS X 10.6, they're all 64 bits
and the comparisons are entirely straightforward.  If you can get the
values into the system, which you did since sysctl was reporting them,
they should work as-expected.

So why were you getting an EINVAL failure?  The only theory that seems
to hold water after looking at the kernel source code is that there was a
pre-existing shm segment with the same key, and you got to this bit in
shmget_existing():

        if (uap->size && uap->size > shmseg->u.shm_segsz)
                return EINVAL;

       if ((uap->shmflg & (IPC_CREAT | IPC_EXCL)) == (IPC_CREAT | IPC_EXCL))
                return EEXIST;

IOW, if there's an existing shm segment of the same key but it's too
small, you get EINVAL.  Not EEXIST, which is what our code in
InternalIpcMemoryCreate is expecting to see for a collision.  So our
code fails to recognize the case as a collision and spits out a
misleading error message.

If memory serves, we've seen this issue before [ ... digs in archives
... ] ah-hah:

http://archives.postgresql.org/pgsql-hackers/2008-10/msg00896.php

I argued in that message that returning EINVAL when EEXIST would apply
is a kernel bug, and I still think that.  But what seems likely at this
point is that many BSD-derived kernels behave this way, and we're never
gonna get 'em all fixed.  So it would behoove us to work around it.

Does the theory of a pre-existing smaller shmem segment make sense from
your end?  In particular, had you previously had another Postgres server
running on that machine, and perhaps killed it ungracefully?  If this
theory is correct, the issue was "fixed" as a result of rebooting (thus
making the old segment go away), not as a result of any changes of the
shm parameters.  OTOH I'd have expected you to have to reboot multiple
times while experimenting with the shm parameters, so I'm not entirely
convinced I've hit on the right explanation.

			regards, tom lane

-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin