I re-added gluster-users to get some more eye on this. ----- Original Message ----- > From: "Christoph Schäbel" <christoph.schaebel@xxxxxxxxxxxx> > To: "Ben Turner" <bturner@xxxxxxxxxx> > Sent: Wednesday, August 30, 2017 8:18:31 AM > Subject: Re: GFID attir is missing after adding large amounts of data > > Hello Ben, > > thank you for offering your help. > > Here are outputs from all the gluster commands I could think of. > Note that we had to remove the terrabytes of data to keep the system > operational, because it is a live system. > > # gluster volume status > > Status of volume: gv0 > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick 10.191.206.15:/mnt/brick1/gv0 49154 0 Y 2675 > Brick 10.191.198.15:/mnt/brick1/gv0 49154 0 Y 2679 > Self-heal Daemon on localhost N/A N/A Y > 12309 > Self-heal Daemon on 10.191.206.15 N/A N/A Y 2670 > > Task Status of Volume gv0 > ------------------------------------------------------------------------------ > There are no active volume tasks OK so your bricks are all online, you have two nodes with 1 brick per node. > > # gluster volume info > > Volume Name: gv0 > Type: Replicate > Volume ID: 5e47d0b8-b348-45bb-9a2a-800f301df95b > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: 10.191.206.15:/mnt/brick1/gv0 > Brick2: 10.191.198.15:/mnt/brick1/gv0 > Options Reconfigured: > transport.address-family: inet > performance.readdir-ahead: on > nfs.disable: on You are using a replicate volume with 2 copies of your data, it looks like you are using the defaults as I don't see any tuning. > > # gluster peer status > > Number of Peers: 1 > > Hostname: 10.191.206.15 > Uuid: 030a879d-da93-4a48-8c69-1c552d3399d2 > State: Peer in Cluster (Connected) > > > # gluster —version > > glusterfs 3.8.11 built on Apr 11 2017 09:50:39 > Repository revision: git://git.gluster.com/glusterfs.git > Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> > GlusterFS comes with ABSOLUTELY NO WARRANTY. > You may redistribute copies of GlusterFS under the terms of the GNU General > Public License. You are running Gluster 3.8 which is the latest upstream release marked stable. > > # df -h > > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/vg00-root 75G 5.7G 69G 8% / > devtmpfs 1.9G 0 1.9G 0% /dev > tmpfs 1.9G 0 1.9G 0% /dev/shm > tmpfs 1.9G 17M 1.9G 1% /run > tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup > /dev/sda1 477M 151M 297M 34% /boot > /dev/mapper/vg10-brick1 8.0T 700M 8.0T 1% /mnt/brick1 > localhost:/gv0 8.0T 768M 8.0T 1% /mnt/glusterfs_client > tmpfs 380M 0 380M 0% /run/user/0 > Your brick is: /dev/mapper/vg10-brick1 8.0T 700M 8.0T 1% /mnt/brick1 The block device is 8TB. Can you tell me more about your brick? Is it a single disk or a RAID? If its a RAID can you tell me about the disks? I am interested in: -Size of disks -RAID type -Stripe size -RAID controller I also see: localhost:/gv0 8.0T 768M 8.0T 1% /mnt/glusterfs_client So you are mounting your volume on the local node, is this the mount where you are writing data to? > > > The setup of the servers is done via shell script on CentOS 7 containing the > following commands: > > yum install -y centos-release-gluster > yum install -y glusterfs-server > > mkdir /mnt/brick1 > ssm create -s 999G -n brick1 --fstype xfs -p vg10 /dev/sdb /mnt/brick1 I haven't used system-storage-manager before, do you know if it takes care of properly tuning your storage stack(if you have a RAID that is)? If you don't have a RAID its prolly not that big of a deal, if you do have a RAID we should make sure everything is aware of your stripe size and tune appropriately. > > echo "/dev/mapper/vg10-brick1 /mnt/brick1 xfs defaults 1 2" >> > /etc/fstab > mount -a && mount > mkdir /mnt/brick1/gv0 > > gluster peer probe OTHER_SERVER_IP > > gluster pool list > gluster volume create gv0 replica 2 OWN_SERVER_IP:/mnt/brick1/gv0 > OTHER_SERVER_IP:/mnt/brick1/gv0 > gluster volume start gv0 > gluster volume info gv0 > gluster volume set gv0 network.ping-timeout "10" > gluster volume info gv0 > > # mount as client for archiving cronjob, is already in fstab > mount -a > > # mount via fuse-client > mkdir -p /mnt/glusterfs_client > echo "localhost:/gv0 /mnt/glusterfs_client glusterfs defaults,_netdev 0 0" >> > /etc/fstab > mount -a > > > We untar multiple files (around 1300 tar files) each around 2,7GB in size. > The tar files are not compressed. > We untar the files with a shell script containing the following: > > #! /bin/bash > for f in *.tar; do tar xfP $f; done Your script looks good, I am not that familiar with the tar flag "P" but it looks to mean: -P, --absolute-names Don't strip leading slashes from file names when creating archives. I don't see anything strange here, everything looks OK. > > The script is run as user root, the processes glusterd, glusterfs and > glusterfsd also run under user root. > > Each tar file consists of a single folder with multiple folders and files in > it. > The folder tree looks like this (note that the "=“ is part of the folder > name): > > 1498780800/ > - timeframe_hour=1498780800/ (about 25 of these folders) > -- type=1/ (about 25 folders total) > --- data-x.gz.parquet (between 100MB and 1kb in size) > --- data-x.gz.parquet.crc (around 1kb in size) > -- … > - ... > > Unfortunately I cannot share the file contents with you. Thats no problem, I'll try to recreate this in the lab. > > We have not seen any other issues with glusterfs, when untaring just a few of > those files. I just tried writing a 100GB with dd and did not see any issues > there, the file is replicated and the GFID attribute is set correctly on > both nodes. ACK. I do this all the time, if you saw an issue here I would be worried about your setup. > > We are not able to reproduce this in our lab environment which is a clone > (actual cloned VMs) of the other system, but it only has around 1TB of > storage. > Do you think this could be an issue with the number of files which is > generated by tar (over 1.5 million files). ? > What I can say is that it is not an issue with inodes, that I checked when > all the files where unpacked on the live system. Hmm I am not sure. Its strange that you can't repro this on your other config, in the lab I have a ton of space to work with so I can run a ton of data in my repro. > > If you need anything else, let me know. Can you help clarify your reproducer so I can give it a go in the lab? From what I can tell you have: 1498780800/ <-- Just a string of numbers, this is the root dir of your tarball - timeframe_hour=1498780800/ (about 25 of these folders) <-- This is the second level dir of your tarball, there are ~25 of these dirs that mention a timeframe and an hour -- type=1/ (about 25 folders total) <-- This is the 3rd level of your tar, there are about 25 different type=$X dirs --- data-x.gz.parquet (between 100MB and 1kb in size) <-- This is your actual data. Is there just 1 pair of these file per dir or multiple? --- data-x.gz.parquet.crc (around 1kb in size) <-- This is a checksum for the above file? I have almost everything I need for my reproducer, can you answer the above questions about the data? -b > > Thank you for your help, > Christoph > > Am 29.08.2017 um 06:36 schrieb Ben Turner <bturner@xxxxxxxxxx>: > > > > Also include gluster v status, I want to check the status of your bricks > > and SHD processes. > > > > -b > > > > ----- Original Message ----- > >> From: "Ben Turner" <bturner@xxxxxxxxxx> > >> To: "Christoph Schäbel" <christoph.schaebel@xxxxxxxxxxxx> > >> Cc: gluster-users@xxxxxxxxxxx > >> Sent: Tuesday, August 29, 2017 12:35:05 AM > >> Subject: Re: GFID attir is missing after adding large > >> amounts of data > >> > >> This is strange, a couple of questions: > >> > >> 1. What volume type is this? What tuning have you done? gluster v info > >> output would be helpful here. > >> > >> 2. How big are your bricks? > >> > >> 3. Can you write me a quick reproducer so I can try this in the lab? Is > >> it > >> just a single multi TB file you are untarring or many? If you give me the > >> steps to repro, and I hit it, we can get a bug open. > >> > >> 4. Other than this are you seeing any other problems? What if you untar > >> a > >> smaller file(s)? Can you read and write to the volume with say DD without > >> any problems? > >> > >> It sounds like you have some other issues affecting things here, there is > >> no > >> reason why you shouldn't be able to untar and write multiple TBs of data > >> to > >> gluster. Go ahead and answer those questions and I'll see what I can do > >> to > >> help you out. > >> > >> -b > >> > >> ----- Original Message ----- > >>> From: "Christoph Schäbel" <christoph.schaebel@xxxxxxxxxxxx> > >>> To: gluster-users@xxxxxxxxxxx > >>> Sent: Monday, August 28, 2017 3:55:31 AM > >>> Subject: GFID attir is missing after adding large amounts > >>> of data > >>> > >>> Hi Cluster Community, > >>> > >>> we are seeing some problems when adding multiple terrabytes of data to a > >>> 2 > >>> node replicated GlusterFS installation. > >>> > >>> The version is 3.8.11 on CentOS 7. > >>> The machines are connected via 10Gbit LAN and are running 24/7. The OS is > >>> virtualized on VMWare. > >>> > >>> After a restart of node-1 we see that the log files are growing to > >>> multiple > >>> Gigabytes a day. > >>> > >>> Also there seem to be problems with the replication. > >>> The setup worked fine until sometime after we added the additional data > >>> (around 3 TB in size) to node-1. We added the data to a mountpoint via > >>> the > >>> client, not directly to the brick. > >>> What we did is add tar files via a client-mount and then untar them while > >>> in > >>> the client-mount folder. > >>> The brick (/mnt/brick1/gv0) is using the XFS filesystem. > >>> > >>> When checking the file attributes of one of the files mentioned in the > >>> brick > >>> logs, i can see that the gfid attribute is missing on node-1. On node-2 > >>> the > >>> file does not even exist. > >>> > >>> getfattr -m . -d -e hex > >>> mnt/brick1/gv0/.glusterfs/40/59/40598e46-9868-4d7c-b494-7b978e67370a/type=type1/part-r-00002-4846e211-c81d-4c08-bb5e-f22fa5a4b404.gz.parquet > >>> > >>> # file: > >>> mnt/brick1/gv0/.glusterfs/40/59/40598e46-9868-4d7c-b494-7b978e67370a/type=type1/part-r-00002-4846e211-c81d-4c08-bb5e-f22fa5a4b404.gz.parquet > >>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > >>> > >>> We repeated this scenario a second time with a fresh setup and got the > >>> same > >>> results. > >>> > >>> Does anyone know what we are doing wrong ? > >>> > >>> Is there maybe a problem with glusterfs and tar ? > >>> > >>> > >>> Log excerpts: > >>> > >>> > >>> glustershd.log > >>> > >>> [2017-07-26 15:31:36.290908] I [MSGID: 108026] > >>> [afr-self-heal-entry.c:833:afr_selfheal_entry_do] 0-gv0-replicate-0: > >>> performing entry selfheal on fe5c42ac-5fda-47d4-8221-484c8d826c06 > >>> [2017-07-26 15:31:36.294289] W [MSGID: 114031] > >>> [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-gv0-client-1: remote > >>> operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [No > >>> data available] > >>> [2017-07-26 15:31:36.298287] I [MSGID: 108026] > >>> [afr-self-heal-entry.c:833:afr_selfheal_entry_do] 0-gv0-replicate-0: > >>> performing entry selfheal on e31ae2ca-a3d2-4a27-a6ce-9aae24608141 > >>> [2017-07-26 15:31:36.300695] W [MSGID: 114031] > >>> [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-gv0-client-1: remote > >>> operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [No > >>> data available] > >>> [2017-07-26 15:31:36.303626] I [MSGID: 108026] > >>> [afr-self-heal-entry.c:833:afr_selfheal_entry_do] 0-gv0-replicate-0: > >>> performing entry selfheal on 2cc9dafe-64d3-454a-a647-20deddfaebfe > >>> [2017-07-26 15:31:36.305763] W [MSGID: 114031] > >>> [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-gv0-client-1: remote > >>> operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [No > >>> data available] > >>> [2017-07-26 15:31:36.308639] I [MSGID: 108026] > >>> [afr-self-heal-entry.c:833:afr_selfheal_entry_do] 0-gv0-replicate-0: > >>> performing entry selfheal on cbabf9ed-41be-4d08-9cdb-5734557ddbea > >>> [2017-07-26 15:31:36.310819] W [MSGID: 114031] > >>> [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-gv0-client-1: remote > >>> operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [No > >>> data available] > >>> [2017-07-26 15:31:36.315057] I [MSGID: 108026] > >>> [afr-self-heal-entry.c:833:afr_selfheal_entry_do] 0-gv0-replicate-0: > >>> performing entry selfheal on 8a3c1c16-8edf-40f0-b2ea-8e70c39e1a69 > >>> [2017-07-26 15:31:36.317196] W [MSGID: 114031] > >>> [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-gv0-client-1: remote > >>> operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [No > >>> data available] > >>> > >>> > >>> > >>> bricks/mnt-brick1-gv0.log > >>> > >>> 2017-07-26 15:31:36.287831] E [MSGID: 115050] > >>> [server-rpc-fops.c:156:server_lookup_cbk] 0-gv0-server: 6153546: LOOKUP > >>> <gfid:d99930df-6b47-4b55-9af3-c767afd6584c>/part-r-00001-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet > >>> (d99930df-6b47-4b55-9af3-c767afd6584c/part-r-00001-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet) > >>> ==> (No data available) [No data available] > >>> [2017-07-26 15:31:36.294202] E [MSGID: 113002] [posix.c:266:posix_lookup] > >>> 0-gv0-posix: buf->ia_gfid is null for > >>> /mnt/brick1/gv0/.glusterfs/e7/2d/e72d9005-b958-432b-b4a9-37aaadd9d2df/type=type1/part-r-00001-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet > >>> [No data available] > >>> [2017-07-26 15:31:36.294235] E [MSGID: 115050] > >>> [server-rpc-fops.c:156:server_lookup_cbk] 0-gv0-server: 6153564: LOOKUP > >>> <gfid:fe5c42ac-5fda-47d4-8221-484c8d826c06>/part-r-00001-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet > >>> (fe5c42ac-5fda-47d4-8221-484c8d826c06/part-r-00001-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet) > >>> ==> (No data available) [No data available] > >>> [2017-07-26 15:31:36.300611] E [MSGID: 113002] [posix.c:266:posix_lookup] > >>> 0-gv0-posix: buf->ia_gfid is null for > >>> /mnt/brick1/gv0/.glusterfs/33/d4/33d47146-bc30-49dd-ada8-475bb75435bf/type=type2/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet > >>> [No data available] > >>> [2017-07-26 15:31:36.300645] E [MSGID: 115050] > >>> [server-rpc-fops.c:156:server_lookup_cbk] 0-gv0-server: 6153582: LOOKUP > >>> <gfid:e31ae2ca-a3d2-4a27-a6ce-9aae24608141>/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet > >>> (e31ae2ca-a3d2-4a27-a6ce-9aae24608141/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet) > >>> ==> (No data available) [No data available] > >>> [2017-07-26 15:31:36.305671] E [MSGID: 113002] [posix.c:266:posix_lookup] > >>> 0-gv0-posix: buf->ia_gfid is null for > >>> /mnt/brick1/gv0/.glusterfs/33/d4/33d47146-bc30-49dd-ada8-475bb75435bf/type=type1/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet > >>> [No data available] > >>> [2017-07-26 15:31:36.305711] E [MSGID: 115050] > >>> [server-rpc-fops.c:156:server_lookup_cbk] 0-gv0-server: 6153600: LOOKUP > >>> <gfid:2cc9dafe-64d3-454a-a647-20deddfaebfe>/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet > >>> (2cc9dafe-64d3-454a-a647-20deddfaebfe/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet) > >>> ==> (No data available) [No data available] > >>> [2017-07-26 15:31:36.310735] E [MSGID: 113002] [posix.c:266:posix_lookup] > >>> 0-gv0-posix: buf->ia_gfid is null for > >>> /mnt/brick1/gv0/.glusterfs/df/71/df715321-3078-47c8-bf23-dec47abe46d7/type=type2/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet > >>> [No data available] > >>> [2017-07-26 15:31:36.310767] E [MSGID: 115050] > >>> [server-rpc-fops.c:156:server_lookup_cbk] 0-gv0-server: 6153618: LOOKUP > >>> <gfid:cbabf9ed-41be-4d08-9cdb-5734557ddbea>/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet > >>> (cbabf9ed-41be-4d08-9cdb-5734557ddbea/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet) > >>> ==> (No data available) [No data available] > >>> [2017-07-26 15:31:36.317113] E [MSGID: 113002] [posix.c:266:posix_lookup] > >>> 0-gv0-posix: buf->ia_gfid is null for > >>> /mnt/brick1/gv0/.glusterfs/df/71/df715321-3078-47c8-bf23-dec47abe46d7/type=type3/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet > >>> [No data available] > >>> [2017-07-26 15:31:36.317146] E [MSGID: 115050] > >>> [server-rpc-fops.c:156:server_lookup_cbk] 0-gv0-server: 6153636: LOOKUP > >>> <gfid:8a3c1c16-8edf-40f0-b2ea-8e70c39e1a69>/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet > >>> (8a3c1c16-8edf-40f0-b2ea-8e70c39e1a69/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet) > >>> ==> (No data available) [No data available] > >>> > >>> > >>> Regards, > >>> Christoph > >>> _______________________________________________ > >>> Gluster-users mailing list > >>> Gluster-users@xxxxxxxxxxx > >>> http://lists.gluster.org/mailman/listinfo/gluster-users > >>> > >> > > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users