Re: RDMA inline threshold?

Dan Lavu <dan@xxxxxxxxxx> · Wed, 30 May 2018 10:20:42 -0400

Stefan, 
We'll have to let somebody else chime in. I don't work on this project, just another user, enthusiast and I've spent, still spending much time tuning my own RDMA gluster configuration. In short, I won't have an answer for you. If nobody can answer, I'd suggest filing a bug, that way it can be tracked and reviewed by developers. 

- Dan

On Wed, May 30, 2018 at 6:34 AM, Stefan Solbrig <stefan.solbrig@xxxxx> wrote:
Dear Dan,

thanks for the quick reply!  

I actually tried restarting all processes (and even rebooting all servers), but the error persists. I can also confirm that all birck processes are running.   My volume is a distrubute-only volume (not dispersed, no sharding). 

I also tried mounting with use_readdirp=no,  because the error seems to be connected to readdirp, but this option does not change anything. 

I found to options I might try:  (gluster volume get myvolumename  all | grep readdirp )

   performance.force-readdirp              true

   dht.force-readdirp                      on

Can I turn off these safely? (or what precisely do they do?)

I also assured that all glusterd processes have  unlimited locked memory. 

Just to state it clearly:  I do _not_ see any data corruption.  Just the directory listings do not work (in very rare cases) with rdma transport:

"ls"  shows only a part of the files.

but then I do:

     stat  /path/to/known/filename

it succeeds, and even

   md5sum /path/to/known/filename/that/does/not/get/listed/with/ls

yields the correct result.

best wishes,

Stefan

> Am 30.05.2018 um 03:00 schrieb Dan Lavu <dan@xxxxxxxxxx>:

> 

> Forgot to mention, sometimes I have to do force start other volumes as well, its hard to determine which brick process is locked up from the logs. 

> 

> 

> Status of volume: rhev_vms_primary

> Gluster process                                                                                  TCP Port  RDMA Port  Online  Pid

> ------------------------------------------------------------------------------

> Brick spidey.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary                 0         49157      Y       15666

> Brick deadpool.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary             0         49156      Y       2542 

> Brick groot.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary                   0         49156      Y       2180 

> Self-heal Daemon on localhost                                                                N/A       N/A        N       N/A  << Brick process is not running on any node.

> Self-heal Daemon on spidey.ib.runlevelone.lan                                         N/A       N/A        N       N/A  

> Self-heal Daemon on groot.ib.runlevelone.lan                                           N/A       N/A        N       N/A  

>  

> Task Status of Volume rhev_vms_primary

> ------------------------------------------------------------------------------

> There are no active volume tasks

> 

> 

>  3081  gluster volume start rhev_vms_noshards force

>  3082  gluster volume status

>  3083  gluster volume start rhev_vms_primary force

>  3084  gluster volume status

>  3085  gluster volume start rhev_vms_primary rhev_vms

>  3086  gluster volume start rhev_vms_primary rhev_vms force

> 

> Status of volume: rhev_vms_primary

> Gluster process                                                                                     TCP Port  RDMA Port  Online  Pid

> ------------------------------------------------------------------------------

> Brick spidey.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary                    0         49157      Y       15666

> Brick deadpool.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary                0         49156      Y       2542 

> Brick groot.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary                      0         49156      Y       2180 

> Self-heal Daemon on localhost                                                                   N/A       N/A        Y       8343 

> Self-heal Daemon on spidey.ib.runlevelone.lan                                            N/A       N/A        Y       22381

> Self-heal Daemon on groot.ib.runlevelone.lan                                              N/A       N/A        Y       20633

> 

> Finally..

> 

> Dan

> 

> 

> 

> 

> On Tue, May 29, 2018 at 8:47 PM, Dan Lavu <dan@xxxxxxxxxx> wrote:

> Stefan, 

> 

> Sounds like a brick process is not running. I have notice some strangeness in my lab when using RDMA, I often have to forcibly restart the brick process, often as in every single time I do a major operation, add a new volume, remove a volume, stop a volume, etc.

> 

> gluster volume status <vol>  

> 

> Does any of the self heal daemons show N/A? If that's the case, try forcing a restart on the volume. 

> 

> gluster volume start <vol> force

> 

> This will also explain why your volumes aren't being replicated properly. 

> 

> On Tue, May 29, 2018 at 5:20 PM, Stefan Solbrig <stefan.solbrig@xxxxx> wrote:

> Dear all,

> 

> I faced a problem with a glusterfs volume (pure distributed, _not_ dispersed) over RDMA transport.  One user had a directory with a large number of files (50,000 files) and just doing an "ls" in this directory yields a "Transport endpoint not connected" error. The effect is, that "ls" only shows some files, but not all. 

> 

> The respective log file shows this error message:

> 

> [2018-05-20 20:38:25.114978] W [MSGID: 114031] [client-rpc-fops.c:2578:client3_3_readdirp_cbk] 0-glurch-client-0: remote operation failed [Transport endpoint is not connected]

> [2018-05-20 20:38:27.732796] W [MSGID: 103046] [rdma.c:4089:gf_rdma_process_recv] 0-rpc-transport/rdma: peer (10.100.245.18:49153), couldn't encode or decode the msg properly or write chunks were not provided for replies that were bigger than RDMA_INLINE_THRESHOLD (2048)

> [2018-05-20 20:38:27.732844] W [MSGID: 114031] [client-rpc-fops.c:2578:client3_3_readdirp_cbk] 0-glurch-client-3: remote operation failed [Transport endpoint is not connected]

> [2018-05-20 20:38:27.733181] W [fuse-bridge.c:2897:fuse_readdirp_cbk] 0-glusterfs-fuse: 72882828: READDIRP => -1 (Transport endpoint is not connected)

> 

> I already set the memlock limit for glusterd to unlimited, but the problem persists. 

> 

> Only going from RDMA transport to TCP transport solved the problem.  (I'm running the volume now in mixed mode, config.transport=tcp,rdma).  Mounting with transport=rdma shows this error, mouting with transport=tcp is fine.

> 

> however, this problem does not arise on all large directories, not on all. I didn't recognize a pattern yet. 

> 

> I'm using glusterfs v3.12.6 on the servers, QDR Infiniband HCAs . 

> 

> Is this a known issue with RDMA transport?

> 

> best wishes,

> Stefan

> 

> _______________________________________________

> Gluster-users mailing list

> Gluster-users@xxxxxxxxxxx

> http://lists.gluster.org/mailman/listinfo/gluster-users

> 

> 

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users