Thanks for info Alan, you've kept group metadata-cache, right? I have them enabled as well features.cache-invalidation=on features.cache-invalidation-timeout=600 performance.stat-prefetch=on performance.cache-invalidation=on performance.md-cache-timeout=600 network.inode-lru-limit=50000 but on the brick I suspect issues only partialy performance.stat-prefetch=on performance.md-cache-timeout=600 will try without parallel-readdir and readdir-ahead v On Fri, Jan 26, 2018 at 6:59 AM, Alan Orth <alan.orth@xxxxxxxxx> wrote: > Dear Vlad, > > I'm sorry, I don't want to test this again on my system just yet! It caused > too much instability for my users and I don't have enough resources for a > development environment. The only other variables that changed before the > crashes was the group metadata-cache[0], which I enabled the same day as the > parallel-readdir and readdir-ahead options: > > $ gluster volume set homes group metadata-cache > > I'm hoping Atin or Poornima can shed some light and squash this bug. > > [0] > https://github.com/gluster/glusterfs/blob/release-3.11/doc/release-notes/3.11.0.md > > Regards, > > On Fri, Jan 26, 2018 at 6:10 AM Vlad Kopylov <vladkopy@xxxxxxxxx> wrote: >> >> can you please test parallel-readdir or readdir-ahead gives >> disconnects? so we know which to disable >> >> parallel-readdir doing magic ran on pdf from last year >> >> https://events.static.linuxfound.org/sites/events/files/slides/Gluster_DirPerf_Vault2017_0.pdf >> >> -v >> >> On Thu, Jan 25, 2018 at 8:20 AM, Alan Orth <alan.orth@xxxxxxxxx> wrote: >> > By the way, on a slightly related note, I'm pretty sure either >> > parallel-readdir or readdir-ahead has a regression in GlusterFS 3.12.x. >> > We >> > are running CentOS 7 with kernel-3.10.0-693.11.6.el7.x86_6. >> > >> > I updated my servers and clients to 3.12.4 and enabled these two options >> > after reading about them in the 3.10.0 and 3.11.0 release notes. In the >> > days >> > after enabling these two options all of my clients kept getting >> > disconnected >> > from the volume. The error upon attempting to list a directory or read a >> > file was "Transport endpoint is not connected", after which I would >> > force >> > unmount the volume with `umount -fl /home` and remount it, only to have >> > it >> > get disconnected again a few hours later. >> > >> > Every time the volume disconnected I looked in the client mount log and >> > only >> > found information such as: >> > >> > [2018-01-24 05:52:27.695225] I [MSGID: 108026] >> > [afr-self-heal-common.c:1656:afr_log_selfheal] 2-homes-replicate-1: >> > Completed metadata selfheal on ed3fbafc-734b-41ca-ab30-216399fb9168. >> > sources=[0] sinks=1 >> > [2018-01-24 05:52:27.700611] I [MSGID: 108026] >> > [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] >> > 2-homes-replicate-1: performing metadata selfheal on >> > b6a53629-a831-4ee3-a35e-f47c04297aaa >> > [2018-01-24 05:52:27.703021] I [MSGID: 108026] >> > [afr-self-heal-common.c:1656:afr_log_selfheal] 2-homes-replicate-1: >> > Completed metadata selfheal on b6a53629-a831-4ee3-a35e-f47c04297aaa. >> > sources=[0] sinks=1 >> > >> > I enabled debug logging for that volume's client mount with `gluster >> > volume >> > set homes diagnostics.client-log-level DEBUG` and then I saw this in the >> > client mount log the next time it disconnected: >> > >> > [2018-01-24 08:55:19.138810] D [MSGID: 0] >> > [io-threads.c:358:iot_schedule] >> > 0-homes-io-threads: LOOKUP scheduled as fast fop >> > [2018-01-24 08:55:19.138849] D [MSGID: 0] [dht-common.c:2711:dht_lookup] >> > 0-homes-dht: Calling fresh lookup for >> > /vchebii/revtrans/Hircus-XM_018067032.1.pep.align.fas on >> > homes-readdir-ahead-1 >> > [2018-01-24 08:55:19.138928] D [MSGID: 0] >> > [io-threads.c:358:iot_schedule] >> > 0-homes-io-threads: FSTAT scheduled as fast fop >> > [2018-01-24 08:55:19.138958] D [MSGID: 0] >> > [afr-read-txn.c:220:afr_read_txn] >> > 0-homes-replicate-1: e6ee0427-b17d-4464-a738-e8ea70d77d95: generation >> > now vs >> > cached: 2, 2 >> > [2018-01-24 08:55:19.139187] D [MSGID: 0] >> > [dht-common.c:2294:dht_lookup_cbk] >> > 0-homes-dht: fresh_lookup returned for >> > /vchebii/revtrans/Hircus-XM_018067032.1.pep.align.fas with op_ret 0 >> > [2018-01-24 08:55:19.139200] D [MSGID: 0] >> > [dht-layout.c:873:dht_layout_preset] 0-homes-dht: file = >> > 00000000-0000-0000-0000-000000000000, subvol = homes-readdir-ahead-1 >> > [2018-01-24 08:55:19.139257] D [MSGID: 0] >> > [io-threads.c:358:iot_schedule] >> > 0-homes-io-threads: READDIRP scheduled as fast fop >> > >> > On a hunch I disabled both parallel-readdir and readdir-ahead, which I >> > had >> > only enabled a few days before, and now all of the clients are much more >> > stable, with zero disconnections in the days since I disabled those two >> > volume options. >> > >> > Please take a look! Thanks, >> > >> > On Wed, Jan 24, 2018 at 5:59 AM Atin Mukherjee <amukherj@xxxxxxxxxx> >> > wrote: >> >> >> >> Adding Poornima to take a look at it and comment. >> >> >> >> On Tue, Jan 23, 2018 at 10:39 PM, Alan Orth <alan.orth@xxxxxxxxx> >> >> wrote: >> >>> >> >>> Hello, >> >>> >> >>> I saw that parallel-readdir was an experimental feature in GlusterFS >> >>> version 3.10.0, became stable in version 3.11.0, and is now >> >>> recommended for >> >>> small file workloads in the Red Hat Gluster Storage Server >> >>> documentation[2]. >> >>> I've successfully enabled this on one of my volumes but I notice the >> >>> following in the client mount log: >> >>> >> >>> [2018-01-23 10:24:24.048055] W [MSGID: 101174] >> >>> [graph.c:363:_log_if_unknown_option] 0-homes-readdir-ahead-1: option >> >>> 'parallel-readdir' is not recognized >> >>> [2018-01-23 10:24:24.048072] W [MSGID: 101174] >> >>> [graph.c:363:_log_if_unknown_option] 0-homes-readdir-ahead-0: option >> >>> 'parallel-readdir' is not recognized >> >>> >> >>> The GlusterFS version on the client and server is 3.12.4. What is >> >>> going >> >>> on? >> >>> >> >>> [0] >> >>> >> >>> https://github.com/gluster/glusterfs/blob/release-3.10/doc/release-notes/3.10.0.md >> >>> [1] >> >>> >> >>> https://github.com/gluster/glusterfs/blob/release-3.11/doc/release-notes/3.11.0.md >> >>> [2] >> >>> >> >>> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/small_file_performance_enhancements >> >>> >> >>> Thank you, >> >>> >> >>> >> >>> -- >> >>> >> >>> Alan Orth >> >>> alan.orth@xxxxxxxxx >> >>> https://picturingjordan.com >> >>> https://englishbulgaria.net >> >>> https://mjanja.ch >> >>> >> >>> >> >>> _______________________________________________ >> >>> Gluster-users mailing list >> >>> Gluster-users@xxxxxxxxxxx >> >>> http://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> >> > -- >> > >> > Alan Orth >> > alan.orth@xxxxxxxxx >> > https://picturingjordan.com >> > https://englishbulgaria.net >> > https://mjanja.ch >> > >> > >> > _______________________________________________ >> > Gluster-users mailing list >> > Gluster-users@xxxxxxxxxxx >> > http://lists.gluster.org/mailman/listinfo/gluster-users > > -- > > Alan Orth > alan.orth@xxxxxxxxx > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users