Hi Jeff,
This is regarding the patch
http://review.gluster.org/#/c/3842/ (epoll: edge triggered and
multi-threaded epoll).
The testcase './tests/bugs/bug-873367.t' hangs with this
fix (Please find the stack trace below).
In the code snippet below we found that 'SSL_pending' was
returning 0.
I have added a condition here to return from the function
when there is no data available.
Please suggest if this is OK to do this way or do we need
to restructure this function for multi-threaded epoll?
<code: socket.c>
178 static int
179 ssl_do (rpc_transport_t *this, void *buf, size_t len,
SSL_trinary_func *func)
180 {
....
211 switch
(SSL_get_error(priv->ssl_ssl,r)) {
212 case SSL_ERROR_NONE:
213 return r;
214 case SSL_ERROR_WANT_READ:
215 if
(SSL_pending(priv->ssl_ssl) ==
0)
216 return r;
217 pfd.fd = priv->sock;
221 if (poll(&pfd,1,-1) <
0) {
</code>
Thanks,
Vijay
On Tuesday 24 June 2014 03:55 PM,
Vijaikumar M wrote:
From the stack trace we found that function
'socket_submit_request' is waiting on mutext_lock.
lock is held by the function 'ssl_do' and this function is
blocked by poll syscall.
(gdb) bt
#0 0x0000003daa80822d in pthread_join () from
/lib64/libpthread.so.0
#1 0x00007f3b94eea9d0 in event_dispatch_epoll
(event_pool=<value optimized out>) at event-epoll.c:632
#2 0x0000000000407ecd in main (argc=4,
argv=0x7fff160a4528) at glusterfsd.c:2023
(gdb) info threads
10 Thread 0x7f3b8d483700 (LWP 26225)
0x0000003daa80e264 in __lll_lock_wait () from
/lib64/libpthread.so.0
9 Thread 0x7f3b8ca82700 (LWP 26226) 0x0000003daa80f4b5
in sigwait () from /lib64/libpthread.so.0
8 Thread 0x7f3b8c081700 (LWP 26227) 0x0000003daa80b98e
in pthread_cond_timedwait@@GLIBC_2.3.2
()
from /lib64/libpthread.so.0
7 Thread 0x7f3b8b680700 (LWP 26228) 0x0000003daa80b98e
in pthread_cond_timedwait@@GLIBC_2.3.2
()
from /lib64/libpthread.so.0
6 Thread 0x7f3b8a854700 (LWP 26232) 0x0000003daa4e9163
in epoll_wait () from /lib64/libc.so.6
5 Thread 0x7f3b89e53700 (LWP 26233) 0x0000003daa4e9163
in epoll_wait () from /lib64/libc.so.6
4 Thread 0x7f3b833eb700 (LWP 26241) 0x0000003daa4df343
in poll () from /lib64/libc.so.6
3 Thread 0x7f3b82130700 (LWP 26245) 0x0000003daa80e264
in __lll_lock_wait () from /lib64/libpthread.so.0
2 Thread 0x7f3b8172f700 (LWP 26247) 0x0000003daa80e75d
in read () from /lib64/libpthread.so.0
* 1 Thread 0x7f3b94a38700 (LWP 26224) 0x0000003daa80822d
in pthread_join () from /lib64/libpthread.so.0
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f3b82130700
(LWP 26245))]#0 0x0000003daa80e264 in __lll_lock_wait ()
from /lib64/libpthread.so.0
(gdb) bt
#0 0x0000003daa80e264 in __lll_lock_wait () from
/lib64/libpthread.so.0
#1 0x0000003daa809508 in _L_lock_854 () from
/lib64/libpthread.so.0
#2 0x0000003daa8093d7 in pthread_mutex_lock () from
/lib64/libpthread.so.0
#3 0x00007f3b8aa74524 in socket_submit_request
(this=0x7f3b7c0505c0, req=0x7f3b8212f0b0) at socket.c:3134
#4 0x00007f3b94c6b7d5 in rpc_clnt_submit
(rpc=0x7f3b7c029ce0, prog=<value optimized out>,
procnum=<value optimized out>,
cbkfn=0x7f3b892364b0 <client3_3_lookup_cbk>,
proghdr=0x7f3b8212f410,
proghdrcount=1, progpayload=0x0, progpayloadcount=0,
iobref=<value optimized out>, frame=0x7f3b93d2a454,
rsphdr=0x7f3b8212f4c0, rsphdr_count=1,
rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x7f3b700010d0)
at rpc-clnt.c:1556
#5 0x00007f3b892243b0 in client_submit_request
(this=0x7f3b7c005ef0, req=<value optimized out>,
frame=0x7f3b93d2a454, prog=0x7f3b894525a0,
procnum=27, cbkfn=0x7f3b892364b0 <client3_3_lookup_cbk>,
iobref=0x0,
rsphdr=0x7f3b8212f4c0, rsphdr_count=1,
rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x7f3b700010d0,
xdrproc=0x7f3b94a4ede0 <xdr_gfs3_lookup_req>)
at client.c:243
#6 0x00007f3b8922fa42 in client3_3_lookup
(frame=0x7f3b93d2a454, this=0x7f3b7c005ef0, data="">
at client-rpc-fops.c:3119
(gdb) p priv->lock
$1 = {__data = {__lock = 2, __count = 0, __owner = 26241,
__nusers = 1, __kind = 0, __spins = 0, __list = {
__prev = 0x0, __next = 0x0}},
__size = "\002\000\000\000\000\000\000\000\201f\000\000\001",
'\000' <repeats 26 times>, __align = 2}
(gdb) thread 4
[Switching to thread 4 (Thread 0x7f3b833eb700 (LWP
26241))]#0 0x0000003daa4df343 in poll () from
/lib64/libc.so.6
(gdb) bt
#0 0x0000003daa4df343 in poll () from
/lib64/libc.so.6
#1 0x00007f3b8aa71fff in ssl_do (this=0x7f3b7c0505c0,
buf=0x7f3b7c051264, len=4, func=0x3db2441570
<SSL_read>)
at socket.c:216
#2 0x00007f3b8aa7277b in __socket_ssl_readv (this=<value
optimized out>, opvector=<value optimized out>,
opcount=<value optimized out>) at socket.c:335
#3 0x00007f3b8aa72c26 in __socket_cached_read
(this=<value optimized out>, vector=<value
optimized out>,
count=<value optimized out>,
pending_vector=0x7f3b7c051258, pending_count=0x7f3b7c051260,
bytes=0x0, write=0)
at socket.c:422
#4 __socket_rwv (this=<value optimized out>,
vector=<value optimized out>, count=<value
optimized out>,
pending_vector=0x7f3b7c051258,
pending_count=0x7f3b7c051260, bytes=0x0, write=0) at
socket.c:496
#5 0x00007f3b8aa76040 in __socket_readv
(this=0x7f3b7c0505c0) at socket.c:589
#6 __socket_proto_state_machine (this=0x7f3b7c0505c0) at
socket.c:1966
#7 socket_proto_state_machine (this=0x7f3b7c0505c0) at
socket.c:2106
#8 socket_event_poll_in (this=0x7f3b7c0505c0) at
socket.c:2127
#9 0x00007f3b8aa77820 in socket_poller (ctx=0x7f3b7c0505c0)
at socket.c:2338
#10 0x0000003daa8079d1 in start_thread () from
/lib64/libpthread.so.0
#11 0x0000003daa4e8b6d in clone () from /lib64/libc.so.6
Thanks,
Vijay
On Tuesday 24 June 2014 08:59 AM,
Raghavendra Gowdappa wrote:
ok. Sorry, I didn't look into change #. I'll sync up with Vijay.
----- Original Message -----
From: "Anand Avati" <avati@xxxxxxxxxx>
To: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
Cc: vmallika@xxxxxxxxxx
Sent: Tuesday, June 24, 2014 8:55:34 AM
Subject: Re: Change in glusterfs[master]: epoll: Handle client and server FDs in a separate event pool
On 6/23/14, 8:00 PM, Raghavendra Gowdappa wrote:
----- Original Message -----
From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
To: "Anand Avati" <avati@xxxxxxxxxx>
Cc: vmallika@xxxxxxxxxx
Sent: Tuesday, June 24, 2014 8:28:41 AM
Subject: Re: Change in glusterfs[master]: epoll: Handle client and server
FDs in a separate event pool
----- Original Message -----
From: "Anand Avati" <avati@xxxxxxxxxx>
To: vmallika@xxxxxxxxxx
Cc: "Raghavendra G" <rgowdapp@xxxxxxxxxx>
Sent: Monday, June 23, 2014 10:07:19 PM
Subject: Re: Change in glusterfs[master]: epoll: Handle client and server
FDs in a separate event pool
On 6/22/14, 8:47 PM, Vijaikumar Mallikarjuna (Code Review) wrote:
Vijaikumar Mallikarjuna has posted comments on this change.
Change subject: epoll: Handle client and server FDs in a separate event
pool
......................................................................
Patch Set 9:
Hi Avati,
Actually we started working on the fix for Bug# 1096729 which was a
blocker
issue.
We tried multiple ways not to change the current epoll model for now,
however we had to do some changes in the epoll code and ended with this
patch.
MT patch# 3842 looks good to me. It will be great you can help us
getting
the patch in quickly.
Thanks,
Vijay
Copying Raghavendra as he's the RPC guy. Du - #3842 is blocked in review
for a long time because of some incompatibility with RPC SSL mode. Very
likely some issue in our SSL multi-threading code. Can you help Vijai
debug this and move #3842 forward? Also there are new SSL patches from
Jeff upstream. Can you guys check if the new patches fix this problem?
Sure, I'll try to sync up with Vijay.
However, I've a doubt on the approach we've to take. Doesn't your patch on
multithreaded epoll also fix this issue? Given that yours is a generic
solution, shouldn't it be favoured over this solution?
that's precisely what i meant.. #3824 (the more generic MT epoll) is
having some issues with SSL MT code (otherwise it is working fine)
|