Hi Karli,
I think Alex is right in regards with the NFS version and state.
I am only using NFSv3 and the failover is working per expectation.
In my use case, I have 3 nodes with ESXI 6.7 as OS and setup 1x
gluster VM on each of the ESXI host using its local datastore.
Once I have formed the replicate 3, I use the CTDB VIP to present
the NFS3 back to the Vcenter and uses it as a shared storage.
Everything works great other than performance is not very good ...
I am still looking for ways to improve it.
Cheers,
Edy
On 8/15/2018 12:25 AM, Alex Chekholko
wrote:
Hi Karli,
From the client point of view, it really looked like with
NFS v4 there is an open file handle and that just goes stale
and hangs, or something like that, whereas with NFSv3 the
client retries and recovers and continues. I did not
investigate further, I just use v3. I think it has something
to do with NFSv4 being "stateful" and NFSv3 being "stateless".
Can you re-run your test but using NFSv3 on the client
mount? Or do you need to use v4.x?
Regards,
Alex
On Fri,
2018-08-10 at 09:39 -0400, Kaleb S. KEITHLEY wrote:
> On 08/10/2018 09:23 AM, Karli Sjöberg wrote:
> > On Fri, 2018-08-10 at 21:23 +0800, Pui Edylie wrote:
> > > Hi Karli,
> > >
> > > Storhaug works with glusterfs 4.1.2 and latest
nfs-ganesha.
> > >
> > > I just installed them last weekend ... they are
working very well
> > > :)
> >
> > Okay, awesome!
> >
> > Is there any documentation on how to do that?
> >
>
> https://github.com/gluster/storhaug/wiki
>
Thanks Kaleb and Edy!
I have now redone the cluster using the latest and greatest
following
the above guide and repeated the same test I was doing before
(the
rsync while loop) with success. I let (forgot) it run for
about a day
and it was still chugging along nicely when I aborted it, so
success
there!
On to the next test; the catastrophic failure test- where one
of the
servers dies, I'm having a more difficult time with.
1) I start with mounting the share over NFS 4.1 and then
proceed with
writing a 8 GiB large random data file with 'dd', while
"hard-cutting"
the power to the server I'm writing to, the transfer just
stops
indefinitely, until the server comes back again. Is that
supposed to
happen? Like this:
# dd if=/dev/urandom of=/var/tmp/test.bin bs=1M count=8192
# mount -o vers=4.1 hv03v.localdomain:/data /mnt/
# dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M
status=progress
2434793472 bytes (2,4 GB, 2,3 GiB) copied, 42 s, 57,9 MB/s
(here I cut the power and let it be for almost two hours
before turning
it on again)
dd: error writing '/mnt/test.bin': Remote I/O error
2325+0 records in
2324+0 records out
2436890624 bytes (2,4 GB, 2,3 GiB) copied, 6944,84 s, 351 kB/s
# umount /mnt
Here the unmount command hung and I had to hard reset the
client.
2) Another question I have is why some files "change" as you
copy them
out to the Gluster storage? Is that the way it should be? This
time, I
deleted eveything in the destination directory to start over:
# mount -o vers=4.1 hv03v.localdomain:/data /mnt/
# rm -f /mnt/test.bin
# dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M
status=progress
8557428736 bytes (8,6 GB, 8,0 GiB) copied, 122 s, 70,1 MB/s
8192+0 records in
8192+0 records out
8589934592 bytes (8,6 GB, 8,0 GiB) copied, 123,039 s, 69,8
MB/s
# md5sum /var/tmp/test.bin
073867b68fa8eaa382ffe05adb90b583 /var/tmp/test.bin
# md5sum /mnt/test.bin
634187d367f856f3f5fb31846f796397 /mnt/test.bin
# umount /mnt
Thanks in advance!
/K
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
|