Re: Files Missing on Client Side; Still available on bricks

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Thu, 8 Jun 2017 13:04:51 +0530

+Raghavendra/Nithya

On Tue, Jun 6, 2017 at 7:41 PM, Jarsulic, Michael [CRI] <mjarsulic@xxxxxxxxxxxxxxxx> wrote:
Hello,

I am still working at recovering from a few failed OS hard drives on my gluster storage and have been removing, and re-adding bricks quite a bit. I noticed yesterday night that some of the directories are not visible when I access them through the client, but are still on the brick. For example:

Client:

# ls /scratch/dw

Ethiopian_imputation  HGDP  Rolwaling  Tibetan_Alignment

Brick:

# ls /data/brick1/scratch/dw

1000GP_Phase3  Ethiopian_imputation  HGDP  Rolwaling  SGDP  Siberian_imputation  Tibetan_Alignment  mapata

However, the directory is accessible on the client side (just not visible):

# stat /scratch/dw/SGDP

  File: `/scratch/dw/SGDP'

  Size: 212992      Blocks: 416        IO Block: 131072 directory

Device: 21h/33d Inode: 11986142482805280401  Links: 2

Access: (0775/drwxrwxr-x)  Uid: (339748621/dw)   Gid: (339748621/dw)

Access: 2017-06-02 16:00:02.398109000 -0500

Modify: 2017-06-06 06:59:13.004947703 -0500

Change: 2017-06-06 06:59:13.004947703 -0500

The only place I see the directory mentioned in the log files are in the rebalance logs. The following piece may provide a clue as to what is going on:

[2017-06-05 20:46:51.752726] E [MSGID: 109010] [dht-rebalance.c:2259:gf_defrag_get_entry] 0-hpcscratch-dht: /dw/SGDP/HGDP00476_chr6.tped gfid not present

[2017-06-05 20:46:51.752742] E [MSGID: 109010] [dht-rebalance.c:2259:gf_defrag_get_entry] 0-hpcscratch-dht: /dw/SGDP/LP6005441-DNA_B08_chr4.tmp gfid not present

[2017-06-05 20:46:51.752773] E [MSGID: 109010] [dht-rebalance.c:2259:gf_defrag_get_entry] 0-hpcscratch-dht: /dw/SGDP/LP6005441-DNA_B08.geno.tmp gfid not present

[2017-06-05 20:46:51.752789] E [MSGID: 109010] [dht-rebalance.c:2259:gf_defrag_get_entry] 0-hpcscratch-dht: /dw/SGDP/LP6005443-DNA_D02_chr4.out gfid not present

This happened yesterday during a rebalance that failed. However, doing a rebalance fix-layout allowed my to clean up these errors and successfully complete a migration to a re-added brick.

Here is the information for my storage cluster:

# gluster volume info

Volume Name: hpcscratch

Type: Distribute

Volume ID: 80b8eeed-1e72-45b9-8402-e01ae0130105

Status: Started

Number of Bricks: 6

Transport-type: tcp

Bricks:

Brick1: fs001-ib:/data/brick2/scratch

Brick2: fs003-ib:/data/brick5/scratch

Brick3: fs003-ib:/data/brick6/scratch

Brick4: fs004-ib:/data/brick7/scratch

Brick5: fs001-ib:/data/brick1/scratch

Brick6: fs004-ib:/data/brick8/scratch

Options Reconfigured:

server.event-threads: 8

performance.client-io-threads: on

client.event-threads: 8

performance.cache-size: 32MB

performance.readdir-ahead: on

diagnostics.client-log-level: INFO

diagnostics.brick-log-level: INFO

Mount points for the bricks:

/dev/sdb on /data/brick2 type xfs (rw,noatime,nobarrier)

/dev/sda on /data/brick1 type xfs (rw,noatime,nobarrier)

Mount point on the client:

10.xx.xx.xx:/hpcscratch on /scratch type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)

My question is what are some of the possibilities for the root cause of this issue and what is the recommended way of recovering from it? Let me know if you need any more information.

--

Mike Jarsulic

Sr. HPC Administrator

Center for Research Informatics | University of Chicago

773.702.2066

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://lists.gluster.org/mailman/listinfo/gluster-users

-- 
Pranith

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users