+Raghavendra/Nithya
On Tue, Jun 6, 2017 at 7:41 PM, Jarsulic, Michael [CRI] <mjarsulic@xxxxxxxxxxxxxxxx> wrote:
Hello,
I am still working at recovering from a few failed OS hard drives on my gluster storage and have been removing, and re-adding bricks quite a bit. I noticed yesterday night that some of the directories are not visible when I access them through the client, but are still on the brick. For example:
Client:
# ls /scratch/dw
Ethiopian_imputation HGDP Rolwaling Tibetan_Alignment
Brick:
# ls /data/brick1/scratch/dw
1000GP_Phase3 Ethiopian_imputation HGDP Rolwaling SGDP Siberian_imputation Tibetan_Alignment mapata
However, the directory is accessible on the client side (just not visible):
# stat /scratch/dw/SGDP
File: `/scratch/dw/SGDP'
Size: 212992 Blocks: 416 IO Block: 131072 directory
Device: 21h/33d Inode: 11986142482805280401 Links: 2
Access: (0775/drwxrwxr-x) Uid: (339748621/dw) Gid: (339748621/dw)
Access: 2017-06-02 16:00:02.398109000 -0500
Modify: 2017-06-06 06:59:13.004947703 -0500
Change: 2017-06-06 06:59:13.004947703 -0500
The only place I see the directory mentioned in the log files are in the rebalance logs. The following piece may provide a clue as to what is going on:
[2017-06-05 20:46:51.752726] E [MSGID: 109010] [dht-rebalance.c:2259:gf_defrag_get_entry] 0-hpcscratch-dht: /dw/SGDP/HGDP00476_chr6.tped gfid not present
[2017-06-05 20:46:51.752742] E [MSGID: 109010] [dht-rebalance.c:2259:gf_defrag_get_entry] 0-hpcscratch-dht: /dw/SGDP/LP6005441-DNA_B08_ chr4.tmp gfid not present
[2017-06-05 20:46:51.752773] E [MSGID: 109010] [dht-rebalance.c:2259:gf_defrag_get_entry] 0-hpcscratch-dht: /dw/SGDP/LP6005441-DNA_B08. geno.tmp gfid not present
[2017-06-05 20:46:51.752789] E [MSGID: 109010] [dht-rebalance.c:2259:gf_defrag_get_entry] 0-hpcscratch-dht: /dw/SGDP/LP6005443-DNA_D02_ chr4.out gfid not present
This happened yesterday during a rebalance that failed. However, doing a rebalance fix-layout allowed my to clean up these errors and successfully complete a migration to a re-added brick.
Here is the information for my storage cluster:
# gluster volume info
Volume Name: hpcscratch
Type: Distribute
Volume ID: 80b8eeed-1e72-45b9-8402-e01ae0130105
Status: Started
Number of Bricks: 6
Transport-type: tcp
Bricks:
Brick1: fs001-ib:/data/brick2/scratch
Brick2: fs003-ib:/data/brick5/scratch
Brick3: fs003-ib:/data/brick6/scratch
Brick4: fs004-ib:/data/brick7/scratch
Brick5: fs001-ib:/data/brick1/scratch
Brick6: fs004-ib:/data/brick8/scratch
Options Reconfigured:
server.event-threads: 8
performance.client-io-threads: on
client.event-threads: 8
performance.cache-size: 32MB
performance.readdir-ahead: on
diagnostics.client-log-level: INFO
diagnostics.brick-log-level: INFO
Mount points for the bricks:
/dev/sdb on /data/brick2 type xfs (rw,noatime,nobarrier)
/dev/sda on /data/brick1 type xfs (rw,noatime,nobarrier)
Mount point on the client:
10.xx.xx.xx:/hpcscratch on /scratch type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
My question is what are some of the possibilities for the root cause of this issue and what is the recommended way of recovering from it? Let me know if you need any more information.
--
Mike Jarsulic
Sr. HPC Administrator
Center for Research Informatics | University of Chicago
773.702.2066
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users
--
Pranith
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users