Hard-coded allocation of 64 MB of memory upfront too big in some scenarios

lennart@xxxxxxxxxxxxxx (Lennart Poettering) · Wed, 27 Aug 2008 16:20:12 +0200

On Wed, 27.08.08 10:15, rdiezmail-pulseaudio at yahoo.de (rdiezmail-pulseaudio at yahoo.de) wrote:

Heya!

> > There is no need to make this configurable. On modern
> > operating systems memory allocation works via overcommitting:
> > although we allocate this 64mb pool on startup, it is not actually used
> > right-away and thus does not actually consume any precious
> > memory.
> 
> As I stated on my original message, my embedded PowerPC target
> hasn't got shared memory (i.e. not configured in the kernel), so all
> allocations are routed through malloc. At least in debug builds,
> allocations are filled with 0xBAADF00D or a similar pattern in
> advance, so I had to modify the source code because it was consuming
> half of the available memory before doing anything.
> 
> I've noticed there is a tendency to automatically think in terms of
> the PulseAudio server. I'm writing a client that connects to the
> audio server via the network, and, in a client environment, I think
> it's bad practice to allocate 64 MB of memory when in fact you'll be
> using just a tiny portion of it most of the time.

Not sure if that is such a bad practice. Maybe it is if you design for
embedded MMU-less CPUs. But I don't do this. PA is optimized for
normal, modern Unix/Linux machines, with shared memory. In fact the
entire memory model PA uses is designed with a good MMU in mind. 

> Even for the server side, I also think it's bad practice not to
> automatically and periodically release memory to the OS: that
> probaby means that the server will always hold as much memory as the
> peak usage (some maximum reached in the past). At the very least,
> that forever ties up swap space for other applications.

That's however how Unix always used to work. In traditional Unix sbrk()
was called to increase the address space only. Memory was never given
back to the OS again.

Generally PA tries to avoid too much interaction with the kernel and
tries to make allocation lock-free and thus real-time compatible as
much as possible. (Though I have to agree to a lot of compromises here
because I cannot just mlock() all my memory).

The way memory block allocation in PA works is via a lock-free
stack. That way both allocation and freeing is fast. It's basically
just a cmpxchg to allocate or free a block. However to achieve that
you cannot go to the MMU each time.

It's just a matter of where you put your focus: my focus was on
minimizing copying and minimizing overall worst-case memory usage by
sharing memory as much as possible, in addition to make most operations
lock-free. Your focus is different: it is minimizing the address space
size. That goal is however completely irrelevant on desktop machines
and all MMU machines.

> Allocating a big, fixed-sized chunk of memory in advance, in the
> hope that the OS will optimize it away if never used, is also
> questionable. It does not work if the OS is not smart enough (see
> embedded Linux at www.uclinux.org). It tends to obscure the real
> memory ussage of the application by distorting the memory statistics
> (is there a way to find out how much memory the client library
> allocated?). I raises user questions all the time (it makes the app
> look big) which need to be constantly explained. It reserves swap
> file space away that other applications cannot use, it may force you
> to have a swap file in the first place.

Sorry, but relying on an MMU allows me to make a lot of optimizations
I couldn't do on an MMU-less machine (like using shared memory data
transfer). What you are asking me is to go back to a 80's style design
with unpaged memory.

Reading Linux memory statistics is not trivial, you need to know how
to interpret things. But just going back to an 80's style memory
management for that and not using the great possibilities paging
enables you is crazy.

> All that could be fine for a single-instance PulseAudio server
> running on a fully-featured PC. But the trouble is, all those
> negative aspects are automatically transferred to every client
> application that uses the PulseAudio client library. I wish the
> library user could specify the malloc and free routines to use (like
> the zlib library does); after all, the host application may
> implement its own memory pool schema.

PulseAudio doesn't allow you to replace malloc/free, since most modern
libcs allow you to do that already anyway. I saw no reason to do this
extra indirection.  That said, if you can convince me that having this
extra indirection makes sense, than it should be easy to add, since PA
calls malloc/free exclusively through the pa_xmalloc() and pa_xfree()
wrappers.

If you want to make PA run well on MMU-less machines there are few
changes you should be making to PA. Besides disabling memblock
allocation from the pre-allocated pool in src/pulsecore/memblock.c (by
forcing that only blocks of type PA_MEMBLOCK_APPENDED are
allocated) you should disable free lists (i.e. pa_flist_pop() should
always return NULL), so that we'd always go directly to glibc
malloc/free. Then you might want to disable the whole shm logic, and
maybe a few other things.

This should be doable with a handful of #ifdefs plus a patch to
configure.ac. I'd be happy to merge such a patch, if it is minimal enough.

Lennart

-- 
Lennart Poettering                        Red Hat, Inc.
lennart [at] poettering [dot] net         ICQ# 11060553
http://0pointer.net/lennart/           GnuPG 0x1A015CC4