On March 12, 2020 8:06:14 PM GMT+02:00, Pat Haley <phaley@xxxxxxx> wrote: >Hi > >Yesterday we seemed to clear an issue with erroneous "No space left on >device" messages >(https://lists.gluster.org/pipermail/gluster-users/2020-March/037848.html) > >I am now seeing "Stale file handle" messages coming from directories >I've just created. > >We are running gluster 3.7.11 in a distributed volume across 2 servers >(2 bricks each). For the "Stale file handle" for a newly created >directory, I've noticed that the directory does not appear in brick1 >(it >is in the other 3 bricks). > >In the cli.log on the server with brick1 I'm seeing messages like > >-------------------------------------------------------- >[2020-03-12 17:21:36.596908] I [cli.c:721:main] 0-cli: Started running >gluster with version 3.7.11 >[2020-03-12 17:21:36.604587] I >[cli-cmd-volume.c:1795:cli_check_gsync_present] 0-: geo-replication not > >installed >[2020-03-12 17:21:36.605100] I [MSGID: 101190] >[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > >with index 1 >[2020-03-12 17:21:36.605155] I [socket.c:2356:socket_event_handler] >0-transport: disconnecting now >[2020-03-12 17:21:36.617433] I [input.c:36:cli_batch] 0-: Exiting with: >0 >-------------------------------------------------------- > >I'm not sure why I would be getting any geo-replication messages, we >aren't using replication. The cli.log on the other server is showing > >-------------------------------------------------------- >[2020-03-12 17:27:08.172573] I [cli.c:721:main] 0-cli: Started running >gluster with version 3.7.11 >[2020-03-12 17:27:08.302564] I [MSGID: 101190] >[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > >with index 1 >[2020-03-12 17:27:08.302716] I [socket.c:2356:socket_event_handler] >0-transport: disconnecting now >[2020-03-12 17:27:08.304557] I [input.c:36:cli_batch] 0-: Exiting with: >0 >-------------------------------------------------------- > > >On the server with brick1, the etc-glusterfs-glusterd.vol.log is >showing > >-------------------------------------------------------- >[2020-03-12 17:21:25.925394] I [MSGID: 106499] >[glusterd-handler.c:4331:__glusterd_handle_status_volume] 0-management: > >Received status volume req for volume data-volume >[2020-03-12 17:21:25.946240] W [MSGID: 106217] >[glusterd-op-sm.c:4630:glusterd_op_modify_op_ctx] 0-management: Failed >uuid to hostname conversion >[2020-03-12 17:21:25.946282] W [MSGID: 106387] >[glusterd-op-sm.c:4734:glusterd_op_modify_op_ctx] 0-management: op_ctx >modification failed >[2020-03-12 17:21:36.617090] I [MSGID: 106487] >[glusterd-handler.c:1472:__glusterd_handle_cli_list_friends] >0-glusterd: >Received cli list req >[2020-03-12 17:21:15.577829] I [MSGID: 106488] >[glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd: >Received get vol req >-------------------------------------------------------- > >On the other server I'm seeing similar messages > >-------------------------------------------------------- >[2020-03-12 17:26:57.024168] I [MSGID: 106499] >[glusterd-handler.c:4331:__glusterd_handle_status_volume] 0-management: > >Received status volume req for volume data-volume >[2020-03-12 17:26:57.037269] W [MSGID: 106217] >[glusterd-op-sm.c:4630:glusterd_op_modify_op_ctx] 0-management: Failed >uuid to hostname conversion >[2020-03-12 17:26:57.037299] W [MSGID: 106387] >[glusterd-op-sm.c:4734:glusterd_op_modify_op_ctx] 0-management: op_ctx >modification failed >[2020-03-12 17:26:42.025200] I [MSGID: 106488] >[glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd: >Received get vol req >[2020-03-12 17:27:08.304267] I [MSGID: 106487] >[glusterd-handler.c:1472:__glusterd_handle_cli_list_friends] >0-glusterd: >Received cli list req >-------------------------------------------------------- > >And I've just noticed that I'm again seeing "No space left on device" >in >the logs of brick1 (although there is 3.5 TB free) > >-------------------------------------------------------- >[2020-03-12 17:19:54.576597] E [MSGID: 113027] >[posix.c:1427:posix_mkdir] 0-data-volume-posix: mkdir of >/mnt/brick1/projects/deep_sea_mining/Tide/2020/Mar06/ccfzR75deg_001 >failed [No space left on device] >[2020-03-12 17:19:54.576681] E [MSGID: 115056] >[server-rpc-fops.c:512:server_mkdir_cbk] 0-data-volume-server: 5001698: > >MKDIR /projects/deep_sea_mining/Tide/2020/Mar06/ccfzR75deg_001 >(96e0b7e4-6b43-42ef-9896-86097b4208fe/ccfzR75deg_001) ==> (No space >left >on device) [No space left on device] >-------------------------------------------------------- > >Any thoughts would be greatly appreciated. (Some additional >information >below) > >Thanks > >Pat > >-------------------------------------------------------- >server 1: >[root@mseas-data2 ~]# df -h >Filesystem Size Used Avail Use% Mounted on >/dev/sdb 164T 161T 3.5T 98% /mnt/brick2 >/dev/sda 164T 159T 5.4T 97% /mnt/brick1 > >[root@mseas-data2 ~]# df -i >Filesystem Inodes IUsed IFree IUse% Mounted on >/dev/sdb 7031960320 31213790 7000746530 1% /mnt/brick2 >/dev/sda 7031960320 28707456 7003252864 1% /mnt/brick1 >-------------------------------------------------------- > >-------------------------------------------------------- >server 2: >[root@mseas-data3 ~]# df -h >Filesystem Size Used Avail Use% Mounted on >/dev/sda 91T 88T 3.9T 96% /export/sda/brick3 >/dev/mapper/vg_Data4-lv_Data4 > 91T 89T 2.6T 98% /export/sdc/brick4 > >[root@mseas-data3 glusterfs]# df -i >Filesystem Inodes IUsed IFree IUse% Mounted on >/dev/sda 1953182464 10039172 1943143292 1% >/export/sda/brick3 >/dev/mapper/vg_Data4-lv_Data4 > 3906272768 11917222 3894355546 1% >/export/sdc/brick4 >-------------------------------------------------------- > >-------------------------------------------------------- >[root@mseas-data2 ~]# gluster volume info >-------------------------------------------------------- >Volume Name: data-volume >Type: Distribute >Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 >Status: Started >Number of Bricks: 4 >Transport-type: tcp >Bricks: >Brick1: mseas-data2:/mnt/brick1 >Brick2: mseas-data2:/mnt/brick2 >Brick3: mseas-data3:/export/sda/brick3 >Brick4: mseas-data3:/export/sdc/brick4 >Options Reconfigured: >cluster.min-free-disk: 1% >nfs.export-volumes: off >nfs.disable: on >performance.readdir-ahead: on >diagnostics.brick-sys-log-level: WARNING >nfs.exports-auth-enable: on >server.allow-insecure: on >auth.allow: * >disperse.eager-lock: off >performance.open-behind: off >performance.md-cache-timeout: 60 >network.inode-lru-limit: 50000 >diagnostics.client-log-level: ERROR > >-------------------------------------------------------- >[root@mseas-data2 ~]# gluster volume status data-volume detail >-------------------------------------------------------- >Status of volume: data-volume >------------------------------------------------------------------------------ >Brick : Brick mseas-data2:/mnt/brick1 >TCP Port : 49154 >RDMA Port : 0 >Online : Y >Pid : 4601 >File System : xfs >Device : /dev/sda >Mount Options : rw >Inode Size : 256 >Disk Space Free : 5.4TB >Total Disk Space : 163.7TB >Inode Count : 7031960320 >Free Inodes : 7003252864 >------------------------------------------------------------------------------ >Brick : Brick mseas-data2:/mnt/brick2 >TCP Port : 49155 >RDMA Port : 0 >Online : Y >Pid : 7949 >File System : xfs >Device : /dev/sdb >Mount Options : rw >Inode Size : 256 >Disk Space Free : 3.4TB >Total Disk Space : 163.7TB >Inode Count : 7031960320 >Free Inodes : 7000746530 >------------------------------------------------------------------------------ >Brick : Brick mseas-data3:/export/sda/brick3 >TCP Port : 49153 >RDMA Port : 0 >Online : Y >Pid : 4650 >File System : xfs >Device : /dev/sda >Mount Options : rw >Inode Size : 512 >Disk Space Free : 3.9TB >Total Disk Space : 91.0TB >Inode Count : 1953182464 >Free Inodes : 1943143292 >------------------------------------------------------------------------------ >Brick : Brick mseas-data3:/export/sdc/brick4 >TCP Port : 49154 >RDMA Port : 0 >Online : Y >Pid : 23772 >File System : xfs >Device : /dev/mapper/vg_Data4-lv_Data4 >Mount Options : rw >Inode Size : 256 >Disk Space Free : 2.6TB >Total Disk Space : 90.9TB >Inode Count : 3906272768 >Free Inodes : 3894355546 > >-- > >-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >Pat Haley Email: phaley@xxxxxxx >Center for Ocean Engineering Phone: (617) 253-6824 >Dept. of Mechanical Engineering Fax: (617) 253-8125 >MIT, Room 5-213 http://web.mit.edu/phaley/www/ >77 Massachusetts Avenue >Cambridge, MA 02139-4301 > >________ > > > >Community Meeting Calendar: > >Schedule - >Every Tuesday at 14:30 IST / 09:00 UTC >Bridge: https://bluejeans.com/441850968 > >Gluster-users mailing list >Gluster-users@xxxxxxxxxxx >https://lists.gluster.org/mailman/listinfo/gluster-users Hey Pat, The logs are not providing much information , but the following seems strange: 'Failed uuid to hostname conversion' Have you checked dns resolution (both short name and fqdn)? Also, check the systems' ntp/chrony is in sync and the 'gluster peer status' on all nodes. Is it possible that the client is not reaching all bricks ? P.S.: Consider increasing the log level, as current level is not sufficient. Best Regards, Strahil Nikolov ________ Community Meeting Calendar: Schedule - Every Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users