+csaba
On Tue, Feb 27, 2018 at 2:49 AM, Jim Prewett <download@xxxxxxxxxxxx> wrote:
Hello,
I'm having problems when write-behind is enabled on Gluster 3.8.4.
I have 2 Gluster servers each with a single brick that is mirrored between them. The code causing these issues reads two data files each approx. 128G in size. It opens a third file, mmap()'s that file, and subsequently reads and writes to it. The third file, on sucessful runs (without write-behind enabled) is ultimately approx. 224G in size.
What exactly is the problem you are facing with write-behind enabled? Is it that the file size is smaller?
The servers have the IP addresses 172.17.2.254 and 172.17.2.255 and the client has the IP address 172.17.1.61. These are all IP over InfiniBand.
I'm attaching logfiles for the brick and for the volume from each of the servers and for the client. I'm also attaching the output of "gluster volume info" and "gluster volume get <volume> all".
I have only noticed problems with write-behind being enabled with this one particular workload. When I ran it under strace, I see it seeking all over the place and reading and writing little bits of data to/from the third file.
What is the pattern you see when write-behind is disabled? Can you attach strace of the application for both scenarios - write-behind enabled and disabled? Can you also explain the workload and its data access pattern?
For now, I'm leaving write-behind disabled. What are the performance implications of this for jobs that don't have this strange access pattern?
Disabling write-behind can bring down performance for sequential workloads.
My co-worker who usually maintains the Gluster filesystems here is busy having a baby right now and I've gotten it while he's out, so I'm /really/ new to Gluster and am not confident that anything is correct in my configuration (nor do I have a specific reason to doubt its correctness! :)
I have checked the InfiniBand fabric for errors and do not see any beyond the normal PortXmitWait counter. There is no firewall on any of these machines. Their system clocks seem to all be synchronized.
Is there anything additional I can provide to help diagnose this problem?
Thanks for any help you can provide! :)
Jim
James E. Prewett Jim@xxxxxxxxxxx download@xxxxxxxxxxx
Systems Team Leader LoGS: http://www.hpc.unm.edu/~download/LoGS/
Designated Security Officer OpenPGP key: pub 1024D/31816D93
HPC Systems Engineer III UNM HPC 505.277.8210
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users