Re: kdbus: to merge or not to merge?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08/09/2015 09:00 PM, Greg Kroah-Hartman wrote:
> In chatting with Daniel on IRC, he is writing up a summary of how the
> kdbus memory pools work in more detail, and he said he would sent that
> out in a day or so, so that everyone can review.

Yes, let me quickly describe again how the kdbus pool logic works.

Every bus connection (peer) owns a buffer which is used in order to
receive payloads. Such payloads are either messages sent from other
connections, notifications or returned answer structures in return of
query commands (name lists, etc).

In order to avoid the kernel having to maintaining an internal buffer
the connections then read from with an extra command, we decided to let
the connections own their buffer directly, so they can mmap() the memory
into their task. Allocating a local buffer to collect asynchronous
messages is what they would need to do anyway, so we implemented a
short-cut that allows the kernel to directly access the memory and write
to it. The size of this buffer pool is configured by each connection
individually, during the HELLO call, so the kernel interface is as
flexible as any other memory allocation scheme the kernel provides and
is subject to the same limits.

Internally, the connection pool is simply a shmem backed file. From the
context of the HELLO ioctl, we are calling into shmem_file_setup(), so
the file is eventually owned by the task which created the bus task
connecting to the bus. One reason why we do the shmem file allocation in
the kernel and on behalf of a the userspace task is that we clear the
VM_MAYWRITE bit to prevent the task from writing to the pool through its
mapped buffer. We also do not set VM_NORESERVE, so the entire buffer is
pre-accounted for the task that created the connection.

The pool implementation uses an r/b tree to organize the buffer into
slices. Those slices can be kept by userspace as long as the parsing
implementation needs to have access to them. When finished, the slices
are freed. A simple ring buffer cannot cope with the gaps that emerge by
that.

When a connection buffer is written to, it is done from the context of
another task which calls into the kdbus code through one of the ioctls.
The memcg implementation should hence charge the task that acts as
writer, which is maybe not ideal but can be changed easily with some
addition to the internal APIs. We omitted it for the current version,
which is non-intrusive with regards to other kernel subsystems.

The kdbus implementation is actually comparable to two tasks X and Y
which both have their own buffer file open and mmap()ed, and they both
pass their FD to the other side. If X now writes to Y's file, and that
is causing a page fault, X is accounted for it, correct?

The kernel does *not* do any memory allocation to buffer payload, and
all other allocations (for instance, to keep around the internal state
of a connection, names etc) are subject to conservatively chosen
limitations. There is no unbounded memory allocation in kdbus that I am
aware of. If there was, it would clearly be a bug.

Addressing the point Andy made earlier: yes, due to memory
overcommitment, OOM situations may happen with certain patterns, but the
kernel should have the same measures to deal with them that it already
has with other types of shared userspace memory. Right?

Hope that all makes sense, we're open to discussions around the desired
accounting details. I've copied linux-mm to let more people have a look
into this again.


Thanks,
Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]