Re: Stale file handle

Pat Haley <phaley@xxxxxxx> · Fri, 13 Mar 2020 12:10:15 -0400

Hi All,

After performing Strahil's checks and poking around some more, we found 
that the problem was with the underlying filesystem thinking it was full 
when it wasn't.  Following the information in the links below, we found 
that mounting with 64bit inodes fixed this problem.

https://serverfault.com/questions/357367/xfs-no-space-left-on-device-but-i-have-850gb-available

https://support.microfocus.com/kb/doc.php?id=7014318

Thanks

Pat

On 3/12/20 4:24 PM, Strahil Nikolov wrote:
On March 12, 2020 8:06:14 PM GMT+02:00, Pat Haley <phaley@xxxxxxx> wrote:
Hi

Yesterday we seemed to clear an issue with erroneous "No space left on
device" messages
(https://lists.gluster.org/pipermail/gluster-users/2020-March/037848.html)

I am now seeing "Stale file handle" messages coming from directories
I've just created.

We are running gluster 3.7.11 in a distributed volume across 2 servers
(2 bricks each). For the "Stale file handle" for a newly created
directory, I've noticed that the directory does not appear in brick1
(it
is in the other 3 bricks).

In the cli.log on the server with brick1 I'm seeing messages like

--------------------------------------------------------
[2020-03-12 17:21:36.596908] I [cli.c:721:main] 0-cli: Started running
gluster with version 3.7.11
[2020-03-12 17:21:36.604587] I
[cli-cmd-volume.c:1795:cli_check_gsync_present] 0-: geo-replication not

installed
[2020-03-12 17:21:36.605100] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread

with index 1
[2020-03-12 17:21:36.605155] I [socket.c:2356:socket_event_handler]
0-transport: disconnecting now
[2020-03-12 17:21:36.617433] I [input.c:36:cli_batch] 0-: Exiting with:
0
--------------------------------------------------------

I'm not sure why I would be getting any geo-replication messages, we
aren't using replication. The cli.log on the other server is showing

--------------------------------------------------------
[2020-03-12 17:27:08.172573] I [cli.c:721:main] 0-cli: Started running
gluster with version 3.7.11
[2020-03-12 17:27:08.302564] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread

with index 1
[2020-03-12 17:27:08.302716] I [socket.c:2356:socket_event_handler]
0-transport: disconnecting now
[2020-03-12 17:27:08.304557] I [input.c:36:cli_batch] 0-: Exiting with:
0
--------------------------------------------------------

On the server with brick1, the etc-glusterfs-glusterd.vol.log is
showing

--------------------------------------------------------
[2020-03-12 17:21:25.925394] I [MSGID: 106499]
[glusterd-handler.c:4331:__glusterd_handle_status_volume] 0-management:

Received status volume req for volume data-volume
[2020-03-12 17:21:25.946240] W [MSGID: 106217]
[glusterd-op-sm.c:4630:glusterd_op_modify_op_ctx] 0-management: Failed
uuid to hostname conversion
[2020-03-12 17:21:25.946282] W [MSGID: 106387]
[glusterd-op-sm.c:4734:glusterd_op_modify_op_ctx] 0-management: op_ctx
modification failed
[2020-03-12 17:21:36.617090] I [MSGID: 106487]
[glusterd-handler.c:1472:__glusterd_handle_cli_list_friends]
0-glusterd:
Received cli list req
[2020-03-12 17:21:15.577829] I [MSGID: 106488]
[glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd:
Received get vol req
--------------------------------------------------------

On the other server I'm seeing similar messages

--------------------------------------------------------
[2020-03-12 17:26:57.024168] I [MSGID: 106499]
[glusterd-handler.c:4331:__glusterd_handle_status_volume] 0-management:

Received status volume req for volume data-volume
[2020-03-12 17:26:57.037269] W [MSGID: 106217]
[glusterd-op-sm.c:4630:glusterd_op_modify_op_ctx] 0-management: Failed
uuid to hostname conversion
[2020-03-12 17:26:57.037299] W [MSGID: 106387]
[glusterd-op-sm.c:4734:glusterd_op_modify_op_ctx] 0-management: op_ctx
modification failed
[2020-03-12 17:26:42.025200] I [MSGID: 106488]
[glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd:
Received get vol req
[2020-03-12 17:27:08.304267] I [MSGID: 106487]
[glusterd-handler.c:1472:__glusterd_handle_cli_list_friends]
0-glusterd:
Received cli list req
--------------------------------------------------------

And I've just noticed that I'm again seeing "No space left on device"
in
the logs of brick1 (although there is 3.5 TB free)

--------------------------------------------------------
[2020-03-12 17:19:54.576597] E [MSGID: 113027]
[posix.c:1427:posix_mkdir] 0-data-volume-posix: mkdir of
/mnt/brick1/projects/deep_sea_mining/Tide/2020/Mar06/ccfzR75deg_001
failed [No space left on device]
[2020-03-12 17:19:54.576681] E [MSGID: 115056]
[server-rpc-fops.c:512:server_mkdir_cbk] 0-data-volume-server: 5001698:

MKDIR /projects/deep_sea_mining/Tide/2020/Mar06/ccfzR75deg_001
(96e0b7e4-6b43-42ef-9896-86097b4208fe/ccfzR75deg_001) ==> (No space
left
on device) [No space left on device]
--------------------------------------------------------

Any thoughts would be greatly appreciated.  (Some additional
information
below)

Thanks

Pat

--------------------------------------------------------
server 1:
[root@mseas-data2 ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb        164T  161T  3.5T  98% /mnt/brick2
/dev/sda        164T  159T  5.4T  97% /mnt/brick1

[root@mseas-data2 ~]# df -i
Filesystem         Inodes    IUsed      IFree IUse% Mounted on
/dev/sdb       7031960320 31213790 7000746530    1% /mnt/brick2
/dev/sda       7031960320 28707456 7003252864    1% /mnt/brick1
--------------------------------------------------------

--------------------------------------------------------
server 2:
[root@mseas-data3 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda               91T   88T  3.9T  96% /export/sda/brick3
/dev/mapper/vg_Data4-lv_Data4
                        91T   89T  2.6T  98% /export/sdc/brick4

[root@mseas-data3 glusterfs]# df -i
Filesystem               Inodes    IUsed      IFree IUse% Mounted on
/dev/sda             1953182464 10039172 1943143292    1%
/export/sda/brick3
/dev/mapper/vg_Data4-lv_Data4
                      3906272768 11917222 3894355546    1%
/export/sdc/brick4
--------------------------------------------------------

--------------------------------------------------------
[root@mseas-data2 ~]# gluster volume info
--------------------------------------------------------
Volume Name: data-volume
Type: Distribute
Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: mseas-data2:/mnt/brick1
Brick2: mseas-data2:/mnt/brick2
Brick3: mseas-data3:/export/sda/brick3
Brick4: mseas-data3:/export/sdc/brick4
Options Reconfigured:
cluster.min-free-disk: 1%
nfs.export-volumes: off
nfs.disable: on
performance.readdir-ahead: on
diagnostics.brick-sys-log-level: WARNING
nfs.exports-auth-enable: on
server.allow-insecure: on
auth.allow: *
disperse.eager-lock: off
performance.open-behind: off
performance.md-cache-timeout: 60
network.inode-lru-limit: 50000
diagnostics.client-log-level: ERROR

--------------------------------------------------------
[root@mseas-data2 ~]# gluster volume status data-volume detail
--------------------------------------------------------
Status of volume: data-volume
------------------------------------------------------------------------------
Brick                : Brick mseas-data2:/mnt/brick1
TCP Port             : 49154
RDMA Port            : 0
Online               : Y
Pid                  : 4601
File System          : xfs
Device               : /dev/sda
Mount Options        : rw
Inode Size           : 256
Disk Space Free      : 5.4TB
Total Disk Space     : 163.7TB
Inode Count          : 7031960320
Free Inodes          : 7003252864
------------------------------------------------------------------------------
Brick                : Brick mseas-data2:/mnt/brick2
TCP Port             : 49155
RDMA Port            : 0
Online               : Y
Pid                  : 7949
File System          : xfs
Device               : /dev/sdb
Mount Options        : rw
Inode Size           : 256
Disk Space Free      : 3.4TB
Total Disk Space     : 163.7TB
Inode Count          : 7031960320
Free Inodes          : 7000746530
------------------------------------------------------------------------------
Brick                : Brick mseas-data3:/export/sda/brick3
TCP Port             : 49153
RDMA Port            : 0
Online               : Y
Pid                  : 4650
File System          : xfs
Device               : /dev/sda
Mount Options        : rw
Inode Size           : 512
Disk Space Free      : 3.9TB
Total Disk Space     : 91.0TB
Inode Count          : 1953182464
Free Inodes          : 1943143292
------------------------------------------------------------------------------
Brick                : Brick mseas-data3:/export/sdc/brick4
TCP Port             : 49154
RDMA Port            : 0
Online               : Y
Pid                  : 23772
File System          : xfs
Device               : /dev/mapper/vg_Data4-lv_Data4
Mount Options        : rw
Inode Size           : 256
Disk Space Free      : 2.6TB
Total Disk Space     : 90.9TB
Inode Count          : 3906272768
Free Inodes          : 3894355546

--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                          Email:  phaley@xxxxxxx
Center for Ocean Engineering       Phone:  (617) 253-6824
Dept. of Mechanical Engineering    Fax:    (617) 253-8125
MIT, Room 5-213                    http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301

________

Community Meeting Calendar:

Schedule -
Every Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
Hey Pat,

The logs are not  providing  much information  ,  but the following seems strange:
'Failed uuid to hostname conversion'

Have you checked  dns resolution (both short name and fqdn)?
Also,  check the systems' ntp/chrony is in sync  and the  'gluster peer  status'  on all nodes.

Is it possible that the  client  is not reaching all  bricks  ?

P.S.:  Consider  increasing the log level,  as  current level is not sufficient.

Best Regards,
Strahil Nikolov

--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                          Email:  phaley@xxxxxxx
Center for Ocean Engineering       Phone:  (617) 253-6824
Dept. of Mechanical Engineering    Fax:    (617) 253-8125
MIT, Room 5-213                    http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301

________

Community Meeting Calendar:

Schedule -
Every Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users