Re: Fw: Re[2]: missing files

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Wed, 11 Feb 2015 18:49:54 +0530

On 02/11/2015 08:36 AM, Shyam wrote:
Did some analysis with David today on this here is a gist for the list,

1) Volumes classified as slow (i.e with a lot of pre-existing data) 
and fast (new volumes carved from the same backend file system that 
slow bricks are on, with little or no data)

2) We ran an strace of tar and also collected io-stats outputs from 
these volumes, both show that create and mkdir is slower on slow as 
compared to the fast volume. This seems to be the overall reason for 
slowness.
Did you happen to do strace of the brick when this happened? If not, 
David, can we get that information as well?

Pranith

3) The tarball extraction is to a new directory on the gluster mount, 
so all lookups etc. happen within this new name space on the volume

4) Checked memory footprints of the slow bricks and fast bricks etc. 
nothing untoward noticed there

5) Restarted the slow volume, just as a test case to do things from 
scratch, no improvement in performance.

Currently attempting to reproduce this on a local system to see if the 
same behavior is seen so that it becomes easier to debug etc.

Others on the list can chime in as they see fit.

Thanks,
Shyam

On 02/10/2015 09:58 AM, David F. Robinson wrote:
Forwarding to devel list as recommended by Justin...

David

------ Forwarded Message ------
From: "David F. Robinson" <david.robinson@xxxxxxxxxxxxx>
To: "Justin Clift" <justin@xxxxxxxxxxx>
Sent: 2/10/2015 9:49:09 AM
Subject: Re[2]:  missing files

Bad news... I don't think it is the old linkto files. Bad because if
that was the issue, cleaning up all of bad linkto files would have fixed
the issue. It seems like the system just gets slower as you add data.

First, I setup a new clean volume (test2brick) on the same system as the
old one (homegfs_bkp). See 'gluster v info' below. I ran my simple tar
extraction test on the new volume and it took 58-seconds to complete
(which, BTW, is 10-seconds faster than my old non-gluster system, so
kudos). The time on homegfs_bkp is 19-minutes.

Next, I copied 10-terabytes of data over to test2brick and re-ran the
test which then took 7-minutes. I created a test3brick and ran the test
and it took 53-seconds.

To confirm all of this, I deleted all of the data from test2brick and
re-ran the test. It took 51-seconds!!!

BTW. I also checked the .glusterfs for stale linkto files (find . -type
f -size 0 -perm 1000 -exec ls -al {} \;). There are many, many thousands
of these types of files on the old volume and none on the new one, so I
don't think this is related to the performance issue.

Let me know how I should proceed. Send this to devel list? Pranith?
others? Thanks...

[root@gfs01bkp .glusterfs]# gluster volume info homegfs_bkp
Volume Name: homegfs_bkp
Type: Distribute
Volume ID: 96de8872-d957-4205-bf5a-076e3f35b294
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/homegfs_bkp
Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/homegfs_bkp

[root@gfs01bkp .glusterfs]# gluster volume info test2brick
Volume Name: test2brick
Type: Distribute
Volume ID: 123259b2-3c61-4277-a7e8-27c7ec15e550
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/test2brick
Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/test2brick

[root@gfs01bkp glusterfs]# gluster volume info test3brick
Volume Name: test3brick
Type: Distribute
Volume ID: 9b1613fc-f7e5-4325-8f94-e3611a5c3701
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/test3brick
Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/test3brick

 From homegfs_bkp:
# find . -type f -size 0 -perm 1000 -exec ls -al {} \;
--------T 2 gmathur pme_ics 0 Jan 9 16:59
./00/16/00169a69-1a7a-44c9-b2d8-991671ee87c4
---------T 3 jcowan users 0 Jan 9 17:51
./00/16/0016a0a0-fd22-4fb5-b6fb-5d7f9024ab74
---------T 2 morourke sbir 0 Jan 9 18:17
./00/16/0016b36f-32fc-4f2c-accd-e36be2f6c602
---------T 2 carpentr irl 0 Jan 9 18:52
./00/16/00163faf-741c-4e40-8081-784786b3cc71
---------T 3 601 raven 0 Jan 9 22:49
./00/16/00163385-a332-4050-8104-1b1af6cd8249
---------T 3 bangell sbir 0 Jan 9 22:56
./00/16/00167803-0244-46de-8246-d9c382dd3083
---------T 2 morourke sbir 0 Jan 9 23:17
./00/16/00167bc5-fc56-42ee-9e3f-1e238f3828f4
---------T 3 morourke sbir 0 Jan 9 23:34
./00/16/0016a71e-89cf-4a86-9575-49c7e9d216c6
---------T 2 gmathur users 0 Jan 9 23:47
./00/16/00168aa2-d069-4a77-8790-e36431324ca5
---------T 2 bangell users 0 Jan 22 09:24
./00/16/0016e720-a190-4e43-962f-aa3e4216e5f5
---------T 2 root root 0 Jan 22 09:26
./00/16/00169e95-64b7-455c-82dc-d9940ee7fe43
---------T 2 dfrobins users 0 Jan 22 09:27
./00/16/00161b04-1612-4fba-99a4-2a2b54062fdb
---------T 2 mdick users 0 Jan 22 09:27
./00/16/0016ba60-310a-4bee-968a-36eb290e8c9e
---------T 2 dfrobins users 0 Jan 22 09:43
./00/16/00160315-1533-4290-8c1a-72e2fbb1962a
 From test2brick:
find . -type f -size 0 -perm 1000 -exec ls -al {} \;

------ Original Message ------
From: "Justin Clift" <justin@xxxxxxxxxxx>
To: "David F. Robinson" <david.robinson@xxxxxxxxxxxxx>
Sent: 2/9/2015 11:33:54 PM
Subject: Re:  missing files

Interesting. (I'm 1/2 asleep atm and really need sleep soon, so take 
this
with a grain of salt... ;>)

As a curiosity question, does the homegfs_bkp volume have a bunch of
outdated metadata still in it? eg left over extended attributes or
something

Remembering a question you asked earlier er... today/yesterday about 
old
extended attribute entries and if they hang around forever. I don't
know the
answer to that, but if the old volume still has a 1000's (or more) of
entries
around, perhaps there's some lookup problem that's killing lookup
times for
file operations.

On a side note, I can probably setup my test lab stuff here again
tomorrow
and try this stuff out myself to see if I can replicate the problem.
(if that
could potentially be useful?)

+ Justin

On 9 Feb 2015, at 22:56, David F. Robinson
<david.robinson@xxxxxxxxxxxxx> wrote:
 Justin,

 Hoping you can help point this to the right people once again. Maybe
all of these issues are related.

 You can look at the email traffic below, but the summary is that I
was working with Ben to figure out why my GFS system was 20x slower
than my old storage system. During my tracing of this issue, I
determined that if I create a new volume on my storage system, this
slowness goes away. So, either it is faster because it doesn't have
any data on this new volume (I hope this isn't the case) or the older
partitions somehow became corrupted during the upgrades or has some
depricated parameters set that slow it down.

 Very strange and hoping you can once again help... Thanks in 
advance...

 David

 ------ Forwarded Message ------
 From: "David F. Robinson" <david.robinson@xxxxxxxxxxxxx>
 To: "Benjamin Turner" <bennyturns@xxxxxxxxx>
 Sent: 2/9/2015 5:52:00 PM
 Subject: Re[5]:  missing files

 Ben,

 I cleared the logs and rebooted the machine. Same issue. homegfs_bkp
takes 19-minutes and test2brick (the new volume) takes 1-minute.

 Is it possible that some old parameters are still set for
homegfs_bkp that are no longer in use? I tried a gluster volume reset
for homegfs_bkp, but it didn't have any effect.

 I have attached the full logs.

 David

 ------ Original Message ------
 From: "David F. Robinson" <david.robinson@xxxxxxxxxxxxx>
 To: "Benjamin Turner" <bennyturns@xxxxxxxxx>
 Sent: 2/9/2015 5:39:18 PM
 Subject: Re[4]:  missing files

 Ben,

 I have traced this out to a point where I can rule out many issues.
I was hoping you could help me from here.
 I went with the "tar -xPf boost.tar" as my test case, which on my
old storage system took about 1-minute to extract. On my backup
system and my primary storage (both gluster), it takes roughly
19-minutes.

 First step was to create a new storage system (striped RAID, two
sets of 3-drives). All was good here with a gluster extraction time
of 1-minute. I then went to my backup system and created another
partition using only one of the two bricks on that system. Still
1-minute. I went to a two brick setup and it stayed at 1-minute.

 At this point, I have recreated using the same parameters on a
test2brick volume that should be identical to my homegfs_bkp volume.
Everything is the same including how I mounted the volume. The only
different is that the homegfs_bkp has 30-TB of data and the
test2brick is blank. I didn't think that performance would be
affected by putting data on the volume.

 Can you help? Do you have any suggestions? Do you think upgrading
gluster from 3.5 to 3.6.1 to 3.6.2 somehow message up homegfs_bkp?
My layout is shown below. These should give identical speeds.

 [root@gfs01bkp test2brick]# gluster volume info homegfs_bkp
 Volume Name: homegfs_bkp
 Type: Distribute
 Volume ID: 96de8872-d957-4205-bf5a-076e3f35b294
 Status: Started
 Number of Bricks: 2
 Transport-type: tcp
 Bricks:
 Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/homegfs_bkp
 Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/homegfs_bkp
 [root@gfs01bkp test2brick]# gluster volume info test2brick

 Volume Name: test2brick
 Type: Distribute
 Volume ID: 123259b2-3c61-4277-a7e8-27c7ec15e550
 Status: Started
 Number of Bricks: 2
 Transport-type: tcp
 Bricks:
 Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/test2brick
 Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/test2brick

 [root@gfs01bkp brick02bkp]# mount | grep test2brick
 gfsib01bkp.corvidtec.com:/test2brick.tcp on /test2brick type
fuse.glusterfs (rw,allow_other,max_read=131072)
 [root@gfs01bkp brick02bkp]# mount | grep homegfs_bkp
 gfsib01bkp.corvidtec.com:/homegfs_bkp.tcp on /backup/homegfs type
fuse.glusterfs (rw,allow_other,max_read=131072)

 [root@gfs01bkp brick02bkp]# df -h
 Filesystem Size Used Avail Use% Mounted on
 /dev/mapper/vg00-lv_root 20G 1.7G 18G 9% /
 tmpfs 16G 0 16G 0% /dev/shm
 /dev/md126p1 1008M 110M 848M 12% /boot
 /dev/mapper/vg00-lv_opt 5.0G 220M 4.5G 5% /opt
 /dev/mapper/vg00-lv_tmp 5.0G 139M 4.6G 3% /tmp
 /dev/mapper/vg00-lv_usr 20G 2.7G 17G 15% /usr
 /dev/mapper/vg00-lv_var 40G 4.4G 34G 12% /var
 /dev/mapper/vg01-lvol1 88T 22T 67T 25% /data/brick01bkp
 /dev/mapper/vg02-lvol1 88T 22T 67T 25% /data/brick02bkp
 gfsib01bkp.corvidtec.com:/homegfs_bkp.tcp 175T 43T 133T 25%
/backup/homegfs
 gfsib01bkp.corvidtec.com:/test2brick.tcp 175T 43T 133T 25% 
/test2brick

 ------ Original Message ------
 From: "Benjamin Turner" <bennyturns@xxxxxxxxx>
 To: "David F. Robinson" <david.robinson@xxxxxxxxxxxxx>
 Sent: 2/6/2015 12:52:58 PM
 Subject: Re: Re[2]:  missing files

 Hi David. Lets start with the basics and go from there. IIRC you
are using LVM with thick provisioning, lets verify the following:

 1. You have everything properly aligned for your RAID stripe size,
etc. I have attached the script we package with RHS that I am in
the process of updating. I want to double check you created the PV
/ VG / LV with the proper variables. Have a look at the create_pv,
create_vg, and create_lv(old) functions. You will need to know the
stripe size of your raid and the number of stripe elements(data
disks, not hotspares). Also make sure you mkfs.xfs with:

 echo "mkfs -t xfs -f -K -i size=$inode_size -d
sw=$stripe_elements,su=$stripesize -n size=$fs_block_size
/dev/$vgname/$lvname"

 We use 512k inodes because some workload use more than the default
inode size and you don't want xattrs bleeding over inodes.

 2. Are you running RHEL or Centos? If so I would recommend
tuned_profile=rhs-high-throughput. If you don't have that tuned
profile I'll get you everything it sets.

 3. For small files we we recommend the following:

 # RAID related variables.
 # stripesize - RAID controller stripe unit size
 # stripe_elements - the number of data disks
 # The --dataalignment option is used while creating the physical
volumeTo
 # align I/O at LVM layer
 # dataalign -
 # RAID6 is recommended when the workload has predominantly larger
 # files ie not in kilobytes.
 # For RAID6 with 12 disks and 128K stripe element size.
 stripesize=128k
 stripe_elements=10
 dataalign=1280k

 # RAID10 is recommended when the workload has predominantly
smaller files
 # i.e in kilobytes.
 # For RAID10 with 12 disks and 256K stripe element size, uncomment
the
 # lines below.
 # stripesize=256k
 # stripe_elements=6
 # dataalign=1536k

 4. Jumbo frames everywhere! Check out the effect of jumbo frames,
make sure they are setup properly on your switch and add the
MTU=9000 to your ifcfg files(unless you have it already):

https://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf 

(see the jumbo frames section here, the whole thing is a good read)

https://rhsummit.files.wordpress.com/2014/04/bengland_h_1100_rhs_performance.pdf 

(this is updated for 2014)

 5. There is a smallfile enhancement that just landed in master
that is showing me a 60% improvement in writes. This is called
multi threaded epoll and it is looking VERY promising WRT smallfile
performance. Here is a summary:

 Hi all. I see alot of discussion on $subject and I wanted to take
a minute to talk about it and what we can do to test / observe the
effects of it. Lets start with a bit of background:

 **Background**

 -Currently epoll is single threaded on both clients and servers.
   *This leads to a "hot thread" which consumes 100% of a CPU core.
   *This can be observed by running BenE's smallfile benchmark to
create files, running top(on both clients and servers), and
pressing H to show threads.
   *You will be able to see a single glusterfs thread eating 100%
of the CPU:

  2871 root 20 0 746m 24m 3004 S 100.0 0.1 14:35.89 glusterfsd
  4522 root 20 0 747m 24m 3004 S 5.3 0.1 0:02.25 glusterfsd
  4507 root 20 0 747m 24m 3004 S 5.0 0.1 0:05.91 glusterfsd
 21200 root 20 0 747m 24m 3004 S 4.6 0.1 0:21.16 glusterfsd

 -Single threaded epoll is a bottlenck for high IOP / low metadata
workloads(think smallfile). With single threaded epoll we are CPU
bound by the single thread pegging out a CPU.

 So the proposed solution to this problem is to make epoll multi
threaded on both servers and clients. Here is a link to the
upstream proposal:

http://www.gluster.org/community/documentation/index.php/Features/Feature_Smallfile_Perf#multi-thread-epoll 

 Status: [ http://review.gluster.org/#/c/3842/ based on Anand
Avati's patch ]

 Why: remove single-thread-per-brick barrier to higher CPU
utilization by servers

 Use case: multi-client and multi-thread applications

 Improvement: measured 40% with 2 epoll threads and 100% with 4
epoll threads for small file creates to an SSD

 Disadvantage: conflicts with support for SSL sockets, may require
significant code change to support both.

 Note: this enhancement also helps high-IOPS applications such as
databases and virtualization which are not metadata-intensive. This
has been measured already using a Fusion I/O SSD performing random
reads and writes -- it was necessary to define multiple bricks per
SSD device to get Gluster to the same order of magnitude IOPS as a
local filesystem. But this workaround is problematic for users,
because storage space is not properly measured when there are
multiple bricks on the same filesystem.

 Multi threaded epoll is part of a larger page that talks about
smallfile performance enhancements, proposed and happening.

 Goal: if successful, throughput bottleneck should be either the
network or the brick filesystem!
 What it doesn't do: multi-thread-epoll does not solve the
excessive-round-trip protocol problems that Gluster has.
 What it should do: allow Gluster to exploit the mostly untapped
CPU resources on the Gluster servers and clients.
 How it does it: allow multiple threads to read protocol messages
and process them at the same time.
 How to observe: multi-thread-epoll should be configurable (how to
configure? gluster command?), with thread count 1 it should be same
as RHS 3.0, with thread count 2-4 it should show significantly more
CPU utilization (threads visible with "top -H"), resulting in
higher throughput.

 **How to observe**

 Here are the commands needed to setup an environment to test in on
RHS 3.0.3:
 rpm -e glusterfs-api glusterfs glusterfs-libs glusterfs-fuse
glusterfs-geo-replication glusterfs-rdma glusterfs-server
glusterfs-cli gluster-nagios-common samba-glusterfs vdsm-gluster
--nodeps
 rhn_register
 yum groupinstall "Development tools"
 git clone https://github.com/gluster/glusterfs.git
 git branch test
 git checkout test
 git fetch http://review.gluster.org/glusterfs
refs/changes/42/3842/17 && git cherry-pick FETCH_HEAD
 git fetch http://review.gluster.org/glusterfs
refs/changes/88/9488/2 && git cherry-pick FETCH_HEAD
 yum install openssl openssl-devel
 wget
ftp://fr2.rpmfind.net/linux/epel/6/x86_64/cmockery2-1.3.8-2.el6.x86_64.rpm 

 wget
ftp://fr2.rpmfind.net/linux/epel/6/x86_64/cmockery2-devel-1.3.8-2.el6.x86_64.rpm 

 yum install cmockery2-1.3.8-2.el6.x86_64.rpm
cmockery2-devel-1.3.8-2.el6.x86_64.rpm libxml2-devel
 ./autogen.sh
 ./configure
 make
 make install

 Verify you are using the upstream with:

 # gluster -- version

 To enable set multithreaded epoll run the following commands:

 From the patch:
         { .key = "client.event-threads", 839
           .voltype = "protocol/client", 840
           .op_version = GD_OP_VERSION_3_7_0, 841
                                 },
         { .key = "server.event-threads", 946
           .voltype = "protocol/server", 947
           .op_version = GD_OP_VERSION_3_7_0, 948
         },

 # gluster v set <volname> server.event-threads 4
 # gluster v set <volname> client.event-threads 4

 Also grab smallfile:

 https://github.com/bengland2/smallfile

 After git cloneing smallfile run:

 python /small-files/smallfile/smallfile_cli.py --operation create
--threads 8 --file-size 64 --files 10000 --top /gluster-mount
--pause 1000 --host-set "client1 client2"

 Again we will be looking at top + show threads(press H). With 4
threads on both clients and servers you should see something
similar to(this isnt exact, I coped and pasted):

  2871 root 20 0 746m 24m 3004 S 35.0 0.1 14:35.89 glusterfsd
  2872 root 20 0 746m 24m 3004 S 51.0 0.1 14:35.89 glusterfsd
  2873 root 20 0 746m 24m 3004 S 43.0 0.1 14:35.89 glusterfsd
  2874 root 20 0 746m 24m 3004 S 65.0 0.1 14:35.89 glusterfsd
  4522 root 20 0 747m 24m 3004 S 5.3 0.1 0:02.25 glusterfsd
  4507 root 20 0 747m 24m 3004 S 5.0 0.1 0:05.91 glusterfsd
 21200 root 20 0 747m 24m 3004 S 4.6 0.1 0:21.16 glusterfsd

 If you have a test env I would be interested to see how multi
threaded epoll performs, but I am 100% sure its not ready for
production yet. RH will be supporting it with our 3.0.4(the next
one) release unless we find show stopping bugs. My testing looks
very promising though.

 Smallfile performance enhancements are one of the key focuses for
our 3.1 release this summer, we are working very hard to improve
this as this is the use case for the majority of people.

 On Fri, Feb 6, 2015 at 11:59 AM, David F. Robinson
<david.robinson@xxxxxxxxxxxxx> wrote:
 Ben,

 I was hoping you might be able to help with two performance
questions. I was doing some testing of my rsync where I am backing
up my primary gluster system (distributed + replicated) to my
backup gluster system (distributed). I tried three tests where I
rsynced from one of my primary sytems (gfsib02b) to my backup
machine. The test directory contains roughly 5500 files, most of
which are small. The script I ran is shown below which repeats the
tests 3x for each section to check variability in timing.

 1) Writing to the local disk is drastically faster than writing to
gluster. So, my writes to the backup gluster system are what is
slowing me down, which makes sense.
 2) When I write to the backup gluster system (/backup/homegfs),
the timing goes from 35-seconds to 1min40seconds. The question here
is whether you could recommend any settings for this volume that
would improve performance for small file writes? I have included
the output of 'gluster volume info" below.
 3) When I did the same tests on the Source_bkp volume, it is
almost 3x as slow as the homegfs_bkp volume. However, these are
just different volumes on the same storage system. The volume
parameters are identical (see below). The performance of these two
should be identical. Any idea why they wouldn't be? And any
suggestions for how to fix this? The only thing that I see
different between the two is the order of the "Options
reconfigured" section. I assume order of options doesn't matter.

 Backup to local hard disk (no gluster writes)
  time /usr/local/bin/rsync -av --numeric-ids --delete
--block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
gfsib02b:/homegfs/test /temp1
  time /usr/local/bin/rsync -av --numeric-ids --delete
--block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
gfsib02b:/homegfs/test /temp2
  time /usr/local/bin/rsync -av --numeric-ids --delete
--block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
gfsib02b:/homegfs/test /temp3

         real 0m35.579s
         user 0m31.290s
         sys 0m12.282s

         real 0m38.035s
         user 0m31.622s
         sys 0m10.907s
         real 0m38.313s
         user 0m31.458s
         sys 0m10.891s
 Backup to gluster backup system on volume homegfs_bkp
  time /usr/local/bin/rsync -av --numeric-ids --delete
--block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
gfsib02b:/homegfs/test /backup/homegfs/temp1
  time /usr/local/bin/rsync -av --numeric-ids --delete
--block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
gfsib02b:/homegfs/test /backup/homegfs/temp2
  time /usr/local/bin/rsync -av --numeric-ids --delete
--block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
gfsib02b:/homegfs/test /backup/homegfs/temp3

         real 1m42.026s
         user 0m32.604s
         sys 0m9.967s

         real 1m45.480s
         user 0m32.577s
         sys 0m11.994s

         real 1m40.436s
         user 0m32.521s
         sys 0m11.240s

 Backup to gluster backup system on volume Source_bkp
  time /usr/local/bin/rsync -av --numeric-ids --delete
--block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
gfsib02b:/homegfs/test /backup/Source/temp1
  time /usr/local/bin/rsync -av --numeric-ids --delete
--block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
gfsib02b:/homegfs/test /backup/Source/temp2
  time /usr/local/bin/rsync -av --numeric-ids --delete
--block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
gfsib02b:/homegfs/test /backup/Source/temp3

         real 3m30.491s
         user 0m32.676s
         sys 0m10.776s

         real 3m26.076s
         user 0m32.588s
         sys 0m11.048s
         real 3m7.460s
         user 0m32.763s
         sys 0m11.687s

 Volume Name: Source_bkp
 Type: Distribute
 Volume ID: 1d4c210d-a731-4d39-a0c5-ea0546592c1d
 Status: Started
 Number of Bricks: 2
 Transport-type: tcp
 Bricks:
 Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/Source_bkp
 Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/Source_bkp
 Options Reconfigured:
 performance.cache-size: 128MB
 performance.io-thread-count: 32
 server.allow-insecure: on
 network.ping-timeout: 10
 storage.owner-gid: 100
 performance.write-behind-window-size: 128MB
 server.manage-gids: on
 changelog.rollover-time: 15
 changelog.fsync-interval: 3

 Volume Name: homegfs_bkp
 Type: Distribute
 Volume ID: 96de8872-d957-4205-bf5a-076e3f35b294
 Status: Started
 Number of Bricks: 2
 Transport-type: tcp
 Bricks:
 Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/homegfs_bkp
 Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/homegfs_bkp
 Options Reconfigured:
 storage.owner-gid: 100
 performance.io-thread-count: 32
 server.allow-insecure: on
 network.ping-timeout: 10
 performance.cache-size: 128MB
 performance.write-behind-window-size: 128MB
 server.manage-gids: on
 changelog.rollover-time: 15
 changelog.fsync-interval: 3

 ------ Original Message ------
 From: "Benjamin Turner" <bennyturns@xxxxxxxxx>
 To: "David F. Robinson" <david.robinson@xxxxxxxxxxxxx>
 Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>;
"gluster-users@xxxxxxxxxxx" <gluster-users@xxxxxxxxxxx>
 Sent: 2/3/2015 7:12:34 PM
 Subject: Re:  missing files

 It sounds to me like the files were only copied to one replica,
werent there for the initial for the initial ls which triggered a
self heal, and were there for the last ls because they were
healed. Is there any chance that one of the replicas was down
during the rsync? It could be that you lost a brick during copy or
something like that. To confirm I would look for disconnects in
the brick logs as well as checking glusterfshd.log to verify the
missing files were actually healed.

 -b

 On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson
<david.robinson@xxxxxxxxxxxxx> wrote:
 I rsync'd 20-TB over to my gluster system and noticed that I had
some directories missing even though the rsync completed normally.
 The rsync logs showed that the missing files were transferred.

 I went to the bricks and did an 'ls -al
/data/brick*/homegfs/dir/*' the files were on the bricks. After I
did this 'ls', the files then showed up on the FUSE mounts.

 1) Why are the files hidden on the fuse mount?
 2) Why does the ls make them show up on the FUSE mount?
 3) How can I prevent this from happening again?

 Note, I also mounted the gluster volume using NFS and saw the
same behavior. The files/directories were not shown until I did
the "ls" on the bricks.

 David

 ===============================
 David F. Robinson, Ph.D.
 President - Corvid Technologies
 704.799.6944 x101 [office]
 704.252.1310 [cell]
 704.799.7974 [fax]
 David.Robinson@xxxxxxxxxxxxx
 http://www.corvidtechnologies.com/

 _______________________________________________
 Gluster-devel mailing list
 Gluster-devel@xxxxxxxxxxx
 http://www.gluster.org/mailman/listinfo/gluster-devel

 <glusterfs.tgz>

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel