On 03/07/2017 09:48 PM, Tejun Heo wrote:
Hello,
On Tue, Mar 07, 2017 at 09:06:49PM +0100, Krzysztof Opasiak wrote:
Personally, I don't want to use rlimit for this as it ends up returning
error code from for example open() when we hit the limit. This may lead to
some unpredictable crashes in services (esp. those poor proprietary binary
blobs). Instead of injecting errors to service we would like to just get
notification that this service has more opened fds than it should and ask it
to restart in a polite way.
For memory seems to be quite easy to achieve as we can just get eventfd
notification when application passes given memory usage using memory cgroup
controller. Maybe you know some efficient method to do the same for fds?
So, if all you wanna do is reliably detecting open(2) failures, can't
you do that with bpf tracing?
Well detecting failures of open is not enough and it has couple of problems:
1) open(2) is not the only syscall which creates fd. In addition to
other syscalls like socket(2), dup(2), some ioctl() on drivers (for
example video) also creates fds. I'm not sure if we have any other
mechanism than grep through kernel source to find out which ioctl()
creates fd or and which not.
2) As far as I know (I'm not a bpf specialist so please correct me if
I'm wrong), with bpf we are able only to detect such events but we are
unable to prevent them from getting to caller. It means that service
will know that it run out of fds and will need to handle this properly.
If there is a bug in this error path service may crash.
What we would like to get is just a notification to external process
that some limit has been reached without returning error to service itself.
3) Theoretically we could do this using bpf or syscall auditing and
count fds for each userspace process or check /proc/<PID> after each
notification but it's getting very heavy for production environment.
Best regards,
--
Krzysztof Opasiak
Samsung R&D Institute Poland
Samsung Electronics
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html