On Mon, Oct 09, 2017 at 02:07:23PM +0530, Mohit Agrawal wrote: > + > > On Mon, Oct 9, 2017 at 11:33 AM, Mohit Agrawal <moagrawa@xxxxxxxxxx> wrote: > > > > > On Mon, Oct 9, 2017 at 11:16 AM, Mohit Agrawal <moagrawa@xxxxxxxxxx> > > wrote: > > > >> Hi All, > >> > >> > >> For specific to this patch(https://review.gluster.org/#/c/18436/) i am > >> getting crash in nfs(only once) for the > >> test case (./tests/basic/mount-nfs-auth.t), although i tried to execute > >> the same test case in a loop on centos > >> machine but i have not found any crash. > >> > >> After anaylys the crash it seems cache(entry) is invalidate in thread 10 > >> and same it is trying to access > >> in thread 1. > >> > >> >>>>>>>>>>>>>>>>>>>. > >> > >> (gdb) thread 1 > >> [Switching to thread 1 (Thread 0x7fe852cfe700 (LWP 19073))]#0 > >> 0x00007fe859665c85 in auth_cache_lookup ( > >> cache=0x7fe854027db0, fh=0x7fe84466684c, host_addr=0x7fe844565e40 > >> "23.253.175.80", > >> timestamp=0x7fe852cfb1e0, can_write=0x7fe852cfb1dc) > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/xlators/nfs/server/src/auth-cache.c:295 > >> 295 *can_write = lookup_res->item->opts->rw; > >> (gdb) bt > >> #0 0x00007fe859665c85 in auth_cache_lookup (cache=0x7fe854027db0, > >> fh=0x7fe84466684c, > >> host_addr=0x7fe844565e40 "23.253.175.80", timestamp=0x7fe852cfb1e0, > >> can_write=0x7fe852cfb1dc) > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/xlators/nfs/server/src/auth-cache.c:295 > >> #1 0x00007fe859665ebc in is_nfs_fh_cached (cache=0x7fe854027db0, > >> fh=0x7fe84466684c, > >> host_addr=0x7fe844565e40 "23.253.175.80") > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/xlators/nfs/server/src/auth-cache.c:390 > >> #2 0x00007fe85962b82c in mnt3_check_cached_fh (ms=0x7fe854023d60, > >> fh=0x7fe84466684c, > >> host_addr=0x7fe844565e40 "23.253.175.80", is_write_op=_gf_false) > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/xlators/nfs/server/src/mount3.c:1954 > >> #3 0x00007fe85962ba92 in _mnt3_authenticate_req (ms=0x7fe854023d60, > >> req=0x7fe844679148, > >> fh=0x7fe84466684c, path=0x0, authorized_export=0x0, > >> authorized_host=0x0, is_write_op=_gf_false) > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/xlators/nfs/server/src/mount3.c:2011 > >> #4 0x00007fe85962bf65 in mnt3_authenticate_request (ms=0x7fe854023d60, > >> req=0x7fe844679148, > >> fh=0x7fe84466684c, volname=0x0, path=0x0, authorized_path=0x0, > >> authorized_host=0x0, > >> is_write_op=_gf_false) > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/xlators/nfs/server/src/mount3.c:2130 > >> #5 0x00007fe859652370 in nfs3_fh_auth_nfsop (cs=0x7fe8446663c8, > >> is_write_op=_gf_false) > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/xlators/nfs/server/src/nfs3-helpers.c:3981 > >> #6 0x00007fe85963631a in nfs3_lookup_resume (carg=0x7fe8446663c8) > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/xlators/nfs/server/src/nfs3.c:155---Type <return> to continue, or q > >> <return> to quit--- > >> 9 > >> #7 0x00007fe859651b98 in nfs3_fh_resolve_entry_hard (cs=0x7fe8446663c8) > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/xlators/nfs/server/src/nfs3-helpers.c:3791 > >> #8 0x00007fe859651e35 in nfs3_fh_resolve_entry (cs=0x7fe8446663c8) > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/xlators/nfs/server/src/nfs3-helpers.c:3844 > >> #9 0x00007fe859651e94 in nfs3_fh_resolve_resume (cs=0x7fe8446663c8) > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/xlators/nfs/server/src/nfs3-helpers.c:3862 > >> #10 0x00007fe8596520ad in nfs3_fh_resolve_root (cs=0x7fe8446663c8) > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/xlators/nfs/server/src/nfs3-helpers.c:3915 > >> #11 0x00007fe85965245f in nfs3_fh_resolve_and_resume (cs=0x7fe8446663c8, > >> fh=0x7fe852cfc980, > >> entry=0x7fe852cfc9c0 "test-bg-write", resum_fn=0x7fe85963621d > >> <nfs3_lookup_resume>) > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/xlators/nfs/server/src/nfs3-helpers.c:4011 > >> #12 0x00007fe859636dcf in nfs3_lookup (req=0x7fe844679148, > >> fh=0x7fe852cfc980, fhlen=52, > >> name=0x7fe852cfc9c0 "test-bg-write") > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/xlators/nfs/server/src/nfs3.c:1620 > >> #13 0x00007fe85963703f in nfs3svc_lookup (req=0x7fe844679148) > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/xlators/nfs/server/src/nfs3.c:1666 > >> #14 0x00007fe86765f585 in rpcsvc_handle_rpc_call (svc=0x7fe854022a00, > >> trans=0x7fe8545c1fa0, > >> msg=0x7fe844334610) > >> ---Type <return> to continue, or q <return> to quit--- > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/rpc/rpc-lib/src/rpcsvc.c:711 > >> #15 0x00007fe86765f8f8 in rpcsvc_notify (trans=0x7fe8545c1fa0, > >> mydata=0x7fe854022a00, > >> event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7fe844334610) > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/rpc/rpc-lib/src/rpcsvc.c:805 > >> #16 0x00007fe867665458 in rpc_transport_notify (this=0x7fe8545c1fa0, > >> event=RPC_TRANSPORT_MSG_RECEIVED, > >> data=0x7fe844334610) > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/rpc/rpc-lib/src/rpc-transport.c:538 > >> #17 0x00007fe85c44561e in socket_event_poll_in (this=0x7fe8545c1fa0, > >> notify_handled=_gf_true) > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/rpc/rpc-transport/socket/src/socket.c:2319 > >> #18 0x00007fe85c445cb1 in socket_event_handler (fd=12, idx=8, gen=103, > >> data=0x7fe8545c1fa0, poll_in=1, > >> poll_out=0, poll_err=0) > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/rpc/rpc-transport/socket/src/socket.c:2475 > >> #19 0x00007fe867917fd7 in event_dispatch_epoll_handler > >> (event_pool=0x7030d0, event=0x7fe852cfde70) > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/libglusterfs/src/event-epoll.c:583 > >> #20 0x00007fe8679182d9 in event_dispatch_epoll_worker > >> (data=0x7fe85403d060) > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/libglusterfs/src/event-epoll.c:659 > >> #21 0x00007fe866b7baa1 in start_thread () from /lib64/libpthread.so.0 > >> #22 0x00007fe8664e3bcd in clone () from /lib64/libc.so.6 > >> > >> (gdb) thread 10 > >> [Switching to thread 10 (Thread 0x7fe858ed2700 (LWP 19051))]#0 > >> 0x00007fe866b82334 in __lll_lock_wait () > >> from /lib64/libpthread.so.0 > >> (gdb) bt > >> #0 0x00007fe866b82334 in __lll_lock_wait () from /lib64/libpthread.so.0 > >> #1 0x00007fe866b7d5d8 in _L_lock_854 () from /lib64/libpthread.so.0 > >> #2 0x00007fe866b7d4a7 in pthread_mutex_lock () from > >> /lib64/libpthread.so.0 > >> #3 0x00007fe8678a9844 in _gf_msg ( > >> domain=0x7fe85966a448 "ot/workspace/my_glusterfs_bui > >> ld/glusterfs-4.0dev/xlators/nfs/server/src/mount3.c", > >> file=0x7fe85966a3f8 "/lib/glusterd/nfs/exports", function=0x7fe85966b5e0 > >> "init", line=3878, > >> level=GF_LOG_INFO, errnum=0, trace=0, msgid=112151, > >> fmt=0x7fe85966b3b4 "") > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/libglusterfs/src/logging.c:2081 > >> #4 0x00007fe859630287 in _mnt3_auth_param_refresh_thread > >> (argv=0x7fe854023d60) > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/xlators/nfs/server/src/mount3.c:3877 > >> #5 0x00007fe866b7baa1 in start_thread () from /lib64/libpthread.so.0 > >> #6 0x00007fe8664e3bcd in clone () from /lib64/libc.so.6 > >> (gdb) p mstate > >> No symbol "mstate" in current context. > >> (gdb) f 4 > >> #4 0x00007fe859630287 in _mnt3_auth_param_refresh_thread > >> (argv=0x7fe854023d60) > >> at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4. > >> 0dev/xlators/nfs/server/src/mount3.c:3877 > >> 3877 > >> > >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>. > >> > >> In first level analysis i don't think it is related to my patch, please > >> check and response on the same if you have seen earlier also. > >> > >> For specific to core you can access core from this link > >> https://build.gluster.org/job/centos6-regression/6759/console This indeed looks like something that is not related to your change. These kind of races should have been fixed with protecting the auth_cache->dict with a lock, and making all auth_cache_entries reference counted. Going through the code does show any obvious problems where auth_cache->cache_dict is not protected with auth_cache->lock. The patches listed here seem to have been an improvement for a while, it is unclear to me why these kind of problems would surface again. https://review.gluster.org/#/q/topic:bug-1226717 Because you had this crash, there seems to be a race condition somewhere in the auth-cache part of Gluster/NFS. If this happens more regularly, we should investigate a little more for the cause. More recently Facebook merged a patch in their 3.8 branch that also adds locking to the auth_cache structure. However, this change did not base on the patches linked above. Maybe Shreyas or Jeff (+CC) have seen the backtrace of the segfault before? https://review.gluster.org/18247 Thanks, Niels _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-devel