Replying to the last batch of questions I've received...
To reiterate, I am only having problems writing files to disperse volumes when mounting it on an armhf system. Mounting the same volume on an x86-64 system works fine.
Disperse volumes running on arm can not heal.
Replica volumes mount and heal just fine.
All bricks are up and running. I have ensured connectivity and that MTU is correct and identical.
Armhf is 32bit:
# uname -a
Linux gluster01 4.14.55-146 #1 SMP PREEMPT Wed Jul 11 22:31:01 -03 2018 armv7l armv7l armv7l GNU/Linux
Linux gluster01 4.14.55-146 #1 SMP PREEMPT Wed Jul 11 22:31:01 -03 2018 armv7l armv7l armv7l GNU/Linux
# file /bin/bash
/bin/bash: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux 3.2.0, BuildID[sha1]=e0a53f804173b0cd9845bb8a76fee1a1e98a9759, stripped
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.1 LTS
Release: 18.04
Codename: bionic
/bin/bash: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux 3.2.0, BuildID[sha1]=e0a53f804173b0cd9845bb8a76fee1a1e98a9759, stripped
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.1 LTS
Release: 18.04
Codename: bionic
# free
total used free shared buff/cache available
Mem: 2042428 83540 1671004 6052 287884 1895684
Swap: 0 0 0
total used free shared buff/cache available
Mem: 2042428 83540 1671004 6052 287884 1895684
Swap: 0 0 0
8 cores total. 4x running 2ghz and 4x running 1.4ghz
processor : 0
model name : ARMv7 Processor rev 3 (v7l)
BogoMIPS : 24.00
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xc07
CPU revision : 3
model name : ARMv7 Processor rev 3 (v7l)
BogoMIPS : 24.00
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xc07
CPU revision : 3
processor : 4
model name : ARMv7 Processor rev 3 (v7l)
BogoMIPS : 72.00
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x2
CPU part : 0xc0f
CPU revision : 3
model name : ARMv7 Processor rev 3 (v7l)
BogoMIPS : 72.00
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x2
CPU part : 0xc0f
CPU revision : 3
There IS a 98MB /core file from the fuse mount so thats cool.
# file /core
/core: ELF 32-bit LSB core file ARM, version 1 (SYSV), SVR4-style, from '/usr/sbin/glusterfs --process-name fuse --volfile-server=gluster01 --volfile-id', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/usr/sbin/glusterfs', platform: 'v7l'
/core: ELF 32-bit LSB core file ARM, version 1 (SYSV), SVR4-style, from '/usr/sbin/glusterfs --process-name fuse --volfile-server=gluster01 --volfile-id', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/usr/sbin/glusterfs', platform: 'v7l'
I will try and get a bug report with logs filed over the weekend.
This is just an experimental home cluster. I don't have anything on it yet. Its possible I could grant someone SSH access to the cluster if it helps further the gluster project. But the results should be reproducible on something like a raspberry pi. I was hoping to run a dispersed volume on it eventually otherwise I would have never found this issue.
Thank you for the troubleshooting ideas.
-Fox
On Fri, Aug 3, 2018 at 3:33 AM, Milind Changire <mchangir@xxxxxxxxxx> wrote:
What is the endianness of the armhf CPU ?Are you running a 32bit or 64bit Operating System ?On Fri, Aug 3, 2018 at 9:51 AM, Fox <foxxz.net@xxxxxxxxx> wrote:______________________________Just wondering if anyone else is running into the same behavior with disperse volumes described below and what I might be able to do about it.I am using ubuntu 18.04LTS on Odroid HC-2 hardware (armhf) and have installed gluster 4.1.2 via PPA. I have 12 member nodes each with a single brick. I can successfully create a working volume via the command:gluster volume create testvol1 disperse 12 redundancy 4 gluster01:/exports/sda/brick1/testvol1 gluster02:/exports/sda/brick1/ testvol1 gluster03:/exports/sda/brick1/ testvol1 gluster04:/exports/sda/brick1/ testvol1 gluster05:/exports/sda/brick1/ testvol1 gluster06:/exports/sda/brick1/ testvol1 gluster07:/exports/sda/brick1/ testvol1 gluster08:/exports/sda/brick1/ testvol1 gluster09:/exports/sda/brick1/ testvol1 gluster10:/exports/sda/brick1/ testvol1 gluster11:/exports/sda/brick1/ testvol1 gluster12:/exports/sda/brick1/ testvol1 And start the volume:gluster volume start testvol1Mounting the volume on an x86-64 system it performs as expected.Mounting the same volume on an armhf system (such as one of the cluster members) I can create directories but trying to create a file I get an error and the file system unmounts/crashes:root@gluster01:~# mount -t glusterfs gluster01:/testvol1 /mnt
root@gluster01:~# cd /mnt
root@gluster01:/mnt# ls
root@gluster01:/mnt# mkdir test
root@gluster01:/mnt# cd testroot@gluster01:/mnt/test# cp /root/notes.txt ./
cp: failed to close './notes.txt': Software caused connection abort
root@gluster01:/mnt/test# ls
ls: cannot open directory '.': Transport endpoint is not connectedI get many of these in the glusterfsd.log:The message "W [MSGID: 101088] [common-utils.c:4316:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 100 times between [2018-08-03 04:06:39.904166] and [2018-08-03 04:06:57.521895] Furthermore, if a cluster member ducks out (reboots, loses connection, etc) and needs healing the self heal daemon logs messages similar to that above and can not heal - no disk activity (verified via iotop) though very high CPU usage and the volume heal info command indicates the volume needs healing.I tested all of the above in virtual environments using x86-64 VMs and could self heal as expected.Again this only happens when using disperse volumes. Should I be filing a bug report instead?_________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
--Milind
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users