Hi All,
After performing Strahil's checks and poking around some more, we found
that the problem was with the underlying filesystem thinking it was full
when it wasn't. Following the information in the links below, we found
that mounting with 64bit inodes fixed this problem.
https://serverfault.com/questions/357367/xfs-no-space-left-on-device-but-i-have-850gb-available
https://support.microfocus.com/kb/doc.php?id=7014318
Thanks
Pat
On 3/12/20 4:24 PM, Strahil Nikolov wrote:
On March 12, 2020 8:06:14 PM GMT+02:00, Pat Haley <phaley@xxxxxxx> wrote:
Hi
Yesterday we seemed to clear an issue with erroneous "No space left on
device" messages
(https://lists.gluster.org/pipermail/gluster-users/2020-March/037848.html)
I am now seeing "Stale file handle" messages coming from directories
I've just created.
We are running gluster 3.7.11 in a distributed volume across 2 servers
(2 bricks each). For the "Stale file handle" for a newly created
directory, I've noticed that the directory does not appear in brick1
(it
is in the other 3 bricks).
In the cli.log on the server with brick1 I'm seeing messages like
--------------------------------------------------------
[2020-03-12 17:21:36.596908] I [cli.c:721:main] 0-cli: Started running
gluster with version 3.7.11
[2020-03-12 17:21:36.604587] I
[cli-cmd-volume.c:1795:cli_check_gsync_present] 0-: geo-replication not
installed
[2020-03-12 17:21:36.605100] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2020-03-12 17:21:36.605155] I [socket.c:2356:socket_event_handler]
0-transport: disconnecting now
[2020-03-12 17:21:36.617433] I [input.c:36:cli_batch] 0-: Exiting with:
0
--------------------------------------------------------
I'm not sure why I would be getting any geo-replication messages, we
aren't using replication. The cli.log on the other server is showing
--------------------------------------------------------
[2020-03-12 17:27:08.172573] I [cli.c:721:main] 0-cli: Started running
gluster with version 3.7.11
[2020-03-12 17:27:08.302564] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2020-03-12 17:27:08.302716] I [socket.c:2356:socket_event_handler]
0-transport: disconnecting now
[2020-03-12 17:27:08.304557] I [input.c:36:cli_batch] 0-: Exiting with:
0
--------------------------------------------------------
On the server with brick1, the etc-glusterfs-glusterd.vol.log is
showing
--------------------------------------------------------
[2020-03-12 17:21:25.925394] I [MSGID: 106499]
[glusterd-handler.c:4331:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume data-volume
[2020-03-12 17:21:25.946240] W [MSGID: 106217]
[glusterd-op-sm.c:4630:glusterd_op_modify_op_ctx] 0-management: Failed
uuid to hostname conversion
[2020-03-12 17:21:25.946282] W [MSGID: 106387]
[glusterd-op-sm.c:4734:glusterd_op_modify_op_ctx] 0-management: op_ctx
modification failed
[2020-03-12 17:21:36.617090] I [MSGID: 106487]
[glusterd-handler.c:1472:__glusterd_handle_cli_list_friends]
0-glusterd:
Received cli list req
[2020-03-12 17:21:15.577829] I [MSGID: 106488]
[glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd:
Received get vol req
--------------------------------------------------------
On the other server I'm seeing similar messages
--------------------------------------------------------
[2020-03-12 17:26:57.024168] I [MSGID: 106499]
[glusterd-handler.c:4331:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume data-volume
[2020-03-12 17:26:57.037269] W [MSGID: 106217]
[glusterd-op-sm.c:4630:glusterd_op_modify_op_ctx] 0-management: Failed
uuid to hostname conversion
[2020-03-12 17:26:57.037299] W [MSGID: 106387]
[glusterd-op-sm.c:4734:glusterd_op_modify_op_ctx] 0-management: op_ctx
modification failed
[2020-03-12 17:26:42.025200] I [MSGID: 106488]
[glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd:
Received get vol req
[2020-03-12 17:27:08.304267] I [MSGID: 106487]
[glusterd-handler.c:1472:__glusterd_handle_cli_list_friends]
0-glusterd:
Received cli list req
--------------------------------------------------------
And I've just noticed that I'm again seeing "No space left on device"
in
the logs of brick1 (although there is 3.5 TB free)
--------------------------------------------------------
[2020-03-12 17:19:54.576597] E [MSGID: 113027]
[posix.c:1427:posix_mkdir] 0-data-volume-posix: mkdir of
/mnt/brick1/projects/deep_sea_mining/Tide/2020/Mar06/ccfzR75deg_001
failed [No space left on device]
[2020-03-12 17:19:54.576681] E [MSGID: 115056]
[server-rpc-fops.c:512:server_mkdir_cbk] 0-data-volume-server: 5001698:
MKDIR /projects/deep_sea_mining/Tide/2020/Mar06/ccfzR75deg_001
(96e0b7e4-6b43-42ef-9896-86097b4208fe/ccfzR75deg_001) ==> (No space
left
on device) [No space left on device]
--------------------------------------------------------
Any thoughts would be greatly appreciated. (Some additional
information
below)
Thanks
Pat
--------------------------------------------------------
server 1:
[root@mseas-data2 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sdb 164T 161T 3.5T 98% /mnt/brick2
/dev/sda 164T 159T 5.4T 97% /mnt/brick1
[root@mseas-data2 ~]# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sdb 7031960320 31213790 7000746530 1% /mnt/brick2
/dev/sda 7031960320 28707456 7003252864 1% /mnt/brick1
--------------------------------------------------------
--------------------------------------------------------
server 2:
[root@mseas-data3 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda 91T 88T 3.9T 96% /export/sda/brick3
/dev/mapper/vg_Data4-lv_Data4
91T 89T 2.6T 98% /export/sdc/brick4
[root@mseas-data3 glusterfs]# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda 1953182464 10039172 1943143292 1%
/export/sda/brick3
/dev/mapper/vg_Data4-lv_Data4
3906272768 11917222 3894355546 1%
/export/sdc/brick4
--------------------------------------------------------
--------------------------------------------------------
[root@mseas-data2 ~]# gluster volume info
--------------------------------------------------------
Volume Name: data-volume
Type: Distribute
Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: mseas-data2:/mnt/brick1
Brick2: mseas-data2:/mnt/brick2
Brick3: mseas-data3:/export/sda/brick3
Brick4: mseas-data3:/export/sdc/brick4
Options Reconfigured:
cluster.min-free-disk: 1%
nfs.export-volumes: off
nfs.disable: on
performance.readdir-ahead: on
diagnostics.brick-sys-log-level: WARNING
nfs.exports-auth-enable: on
server.allow-insecure: on
auth.allow: *
disperse.eager-lock: off
performance.open-behind: off
performance.md-cache-timeout: 60
network.inode-lru-limit: 50000
diagnostics.client-log-level: ERROR
--------------------------------------------------------
[root@mseas-data2 ~]# gluster volume status data-volume detail
--------------------------------------------------------
Status of volume: data-volume
------------------------------------------------------------------------------
Brick : Brick mseas-data2:/mnt/brick1
TCP Port : 49154
RDMA Port : 0
Online : Y
Pid : 4601
File System : xfs
Device : /dev/sda
Mount Options : rw
Inode Size : 256
Disk Space Free : 5.4TB
Total Disk Space : 163.7TB
Inode Count : 7031960320
Free Inodes : 7003252864
------------------------------------------------------------------------------
Brick : Brick mseas-data2:/mnt/brick2
TCP Port : 49155
RDMA Port : 0
Online : Y
Pid : 7949
File System : xfs
Device : /dev/sdb
Mount Options : rw
Inode Size : 256
Disk Space Free : 3.4TB
Total Disk Space : 163.7TB
Inode Count : 7031960320
Free Inodes : 7000746530
------------------------------------------------------------------------------
Brick : Brick mseas-data3:/export/sda/brick3
TCP Port : 49153
RDMA Port : 0
Online : Y
Pid : 4650
File System : xfs
Device : /dev/sda
Mount Options : rw
Inode Size : 512
Disk Space Free : 3.9TB
Total Disk Space : 91.0TB
Inode Count : 1953182464
Free Inodes : 1943143292
------------------------------------------------------------------------------
Brick : Brick mseas-data3:/export/sdc/brick4
TCP Port : 49154
RDMA Port : 0
Online : Y
Pid : 23772
File System : xfs
Device : /dev/mapper/vg_Data4-lv_Data4
Mount Options : rw
Inode Size : 256
Disk Space Free : 2.6TB
Total Disk Space : 90.9TB
Inode Count : 3906272768
Free Inodes : 3894355546
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: phaley@xxxxxxx
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
________
Community Meeting Calendar:
Schedule -
Every Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
Hey Pat,
The logs are not providing much information , but the following seems strange:
'Failed uuid to hostname conversion'
Have you checked dns resolution (both short name and fqdn)?
Also, check the systems' ntp/chrony is in sync and the 'gluster peer status' on all nodes.
Is it possible that the client is not reaching all bricks ?
P.S.: Consider increasing the log level, as current level is not sufficient.
Best Regards,
Strahil Nikolov
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: phaley@xxxxxxx
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
________
Community Meeting Calendar:
Schedule -
Every Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users