Hi Chuck,
On 8/8/21 12:36 PM, Chuck Lever III wrote:
Hi Dai-
Thanks for posting. I agree this is a real bug,
but I'm going to quibble a little with the
details of your proposed solution.
On Aug 7, 2021, at 1:02 PM, Dai Ngo <dai.ngo@xxxxxxxxxx> wrote:
Currently my_svc_run does not handle poll time allowing idle TCP
s/poll time/poll timeout/
thanks, will fix.
connections to remain ESTABLISHED indefinitely. When the number
of connections reaches the limit the open file descriptors
(ulimit -n) then accept(2) fails with EMFILE. Since libtirpc does
not handle EMFILE returned from accept(2)
Just curious, should libtirpc be handling EMFILE?
Yes, if we don't handle EMFILE returned from accept in libtirpc
then svc_run gets in a tight loop trying to accept the same connection:
poll([{fd=3, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=5, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND},
...
{fd=36, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, ...], 1019, 30000) = 1 ([{fd=4, revents=POLLIN|POLLRDNORM}])
accept(4, 0x7ffd4ac368f0, [128]) = -1 EMFILE (Too many open files)
This is what causes the RPC service to be down.
this get my_svc_run into
a tight loop calling accept(2) resulting in the RPC service being
down, it's no longer able to service any requests.
Fix by add call to __svc_destroy_idle to my_svc_run to remove
inactive connections when poll(2) returns timeout.
Fixes: b2c9430f46c4 Use poll() instead of select() in svc_run()
I don't have this commit in my rpcbind repo.
I do have 44bf15b86861 ("rpcbind: don't use obsolete
svc_fdset interface of libtirpc"). IMO that's the one
you should list here.
Sorry, my mistake, that is the commit for libtirpc.
The commit for rpcbind is as you listed:
44bf15b8 rpcbind: don't use obsolete svc_fdset interface of libtirpc
Also, please Cc: Thorsten Kukuk <kukuk@xxxxxxxxxx> on
future versions of this patch series in case he has
any comments.
will do.
Signed-off-by: dai.ngo@xxxxxxxxxx
---
src/rpcb_svc_com.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/rpcb_svc_com.c b/src/rpcb_svc_com.c
index 1743dadf5db7..cb33519010d3 100644
--- a/src/rpcb_svc_com.c
+++ b/src/rpcb_svc_com.c
@@ -1048,6 +1048,8 @@ netbuffree(struct netbuf *ap)
}
+extern void __svc_destroy_idle(int, bool_t);
Your libtirpc patch does not add __svc_destroy_idle()
as part of the official libtirpc API. We really must
avoid adding "secret" library entry points (not to
mention, as SteveD pointed out, a SONAME change
would be required).
rpcbind used to call __svc_clean_idle(), which is
undocumented (as far as I can tell).
I try to use the same mechanism as __svc_clean_idle, assuming
it was an acceptable approach.
44bf15b86861
removed that call.
which is one of the bugs that causes this problem.
I think a better approach would be to duplicate your
proposed __svc_destroy_idle() code in rpcbind, since
rpcbind is not actually using the library's RPC
dispatcher.
then we have to do the same for nfs-utils. That means the
same code exits in 3 places: libtirpc, rpcbind and nfs-utils.
And any new consumer that uses it own my_svc_run will
need its own __svc_destroy_idle. It does not sound very
efficient.
That would get rid of the technical debt of calling
an undocumented library API.
The helper would need to call svc_vc_destroy()
rather than __xprt_unregister_unlocked() and
__svc_vc_dodestroy(). Also not clear to me that
rpcbind's my_svc_run() needs the protection of
holding svc_fd_lock, if rpcbind itself is not
multi-threaded?
The lock is held by __svc_destroy_idle, or __svc_clean_idle
before it was removed, in libtirpc. I think libtirpc is
multi-threaded.
The alternative would be to construct a fully
documented public API with man pages and header
declarations. I'm not terribly comfortable with
diverging from the historical Sun TI-RPC programming
interface, however.
What I'm trying to do is to follow what __svc_clean_idle
used to do before it was removed. I can't totally re-use
__svc_clean_idle because it was written to work with
svc_fdset and select. The current svc_run uses poll so
the handling of idle connections is a little difference.
-Dai
+
void
my_svc_run()
{
@@ -1076,6 +1078,7 @@ my_svc_run()
* other outside event) and not caused by poll().
*/
case 0:
+ __svc_destroy_idle(30, FALSE);
continue;
default:
/*
--
2.26.2
--
Chuck Lever