Re: df reports wrong full capacity for distributed volumes (Glusterfs 3.12.6-1)

Jose V. Carrión <jocarbur@xxxxxxxxx> · Thu, 1 Mar 2018 10:39:58 +0100

Hi Nithya,Below the output of both volumes:

[root@stor1t ~]# gluster volume rebalance volumedisk1 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost           703964     16384.0PB       1475983             0             0            completed       64:37:55
                           stor2data           704610     16384.0PB       1475199             0             0            completed       64:31:30
                           stor3data           703964     16384.0PB       1475983             0             0            completed       64:37:55
volume rebalance: volumedisk1: success

[root@stor1 ~]# gluster volume rebalance volumedisk0 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost           411919         1.1GB        718044             0             0            completed        2:28:52
                           stor2data           435340     16384.0PB        741287             0             0            completed        2:26:01
                           stor3data           411919         1.1GB        718044             0             0            completed        2:28:52
volume rebalance: volumedisk0: success

And  volumedisk1 rebalance logs finished saying:
[2018-02-13 03:47:48.703311] I [MSGID: 109028] [dht-rebalance.c:5053:gf_defrag_status_get] 0-volumedisk1-dht: Rebalance is completed. Time taken is 232675.00 secs
[2018-02-13 03:47:48.703351] I [MSGID: 109028] [dht-rebalance.c:5057:gf_defrag_status_get] 0-volumedisk1-dht: Files migrated: 703964, size: 14046969178073, lookups: 1475983, failures: 0, skipped: 0

Checking my logs the new stor3node and the rebalance task was executed on  2018-02-10 . From this date to now I have been storing new files.
The sequence of commands to add the node was:

gluster peer probe stor3data
gluster volume add-brick volumedisk0 stor3data:/mnt/disk_b1/glusterfs/vol0
gluster volume add-brick volumedisk0 stor3data:/mnt/disk_b1/glusterfs/vol0

2018-03-01 6:32 GMT+01:00 Nithya Balachandran <nbalacha@xxxxxxxxxx>:
Hi Jose,

On 28 February 2018 at 22:31, Jose V. Carrión <jocarbur@xxxxxxxxx> wrote:
Hi Nithya,
My initial setup was composed of 2 similar nodes: stor1data and stor2data. A month ago I expanded both volumes with a new node: stor3data (2 bricks per volume).
Of course, then to add the new peer with the bricks I did the 'balance force' operation. This task finished successfully (you can see info below) and number of files on the 3 nodes were very similar .

For volumedisk1 I only have files of 500MB and they are continuosly written in sequential mode. The filename pattern of written files is:

run.node1.0000.rd 
run.node2.0000.rd  
run.node1.0001.rd 
run.node2.0001.rd  
run.node1.0002.rd 
run.node2.0002.rd  
...........
...........
run.node1.X.rd 
run.node2.X.rd  

(  X ranging from 0000 to infinite )

Curiously stor1data and stor2data maintain similar ratios in bytes:

Filesystem              1K-blocks        Used               Available     Use% Mounted on
/dev/sdc1             52737613824 17079174264  35658439560  33% /mnt/glusterfs/vol1   -> stor1data
/dev/sdc1             52737613824 17118810848  35618802976  33% /mnt/glusterfs/vol1  ->  stor2data

However the ratio on som3data differs too much (1TB):
Filesystem           1K-blocks        Used                Available       Use% Mounted on
/dev/sdc1             52737613824 15479191748  37258422076  30% /mnt/disk_c/glusterfs/vol1 -> stor3data
/dev/sdd1             52737613824 15566398604  37171215220  30% /mnt/disk_d/glusterfs/vol1 -> stor3data

Thinking in  inodes:

Filesystem                Inodes       IUsed       IFree          IUse% Mounted on
/dev/sdc1             5273970048  851053  5273118995    1% /mnt/glusterfs/vol1 ->  stor1data
/dev/sdc1             5273970048  849388  5273120660    1% /mnt/glusterfs/vol1 ->  stor2data

/dev/sdc1             5273970048  846877  5273123171    1% /mnt/disk_c/glusterfs/vol1 -> stor3data
/dev/sdd1             5273970048  845250  5273124798    1% /mnt/disk_d/glusterfs/vol1 -> stor3data

851053 (stor1) - 845250 (stor3) = 5803 files of difference !

The inode numbers are a little misleading here - gluster uses some to create its own internal files and directory structures. Based on the average file size, I think this would actually work out to a difference of around 2000 files.

In adition, correct me if I'm wrong,  stor3data should have 50% of probability to store a new file (even taking into account the algorithm of DHT with filename patterns)

Theoretically yes , but again, it depends on the filenames and their hash distribution.

Please send us the output of :
gluster volume rebalance <volname> status

for the volume.

Regards,
Nithya

Thanks,
Greetings.

Jose V.

Status of volume: volumedisk0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick stor1data:/mnt/glusterfs/vol0/bri
ck1                                         49152     0          Y       13533
Brick stor2data:/mnt/glusterfs/vol0/bri
ck1                                         49152     0          Y       13302
Brick stor3data:/mnt/disk_b1/glusterfs/
vol0/brick1                                 49152     0          Y       17371
Brick stor3data:/mnt/disk_b2/glusterfs/
vol0/brick1                                 49153     0          Y       17391
NFS Server on localhost                     N/A       N/A        N       N/A  
NFS Server on stor3data                 N/A       N/A        N       N/A  
NFS Server on stor2data                 N/A       N/A        N       N/A  

Task Status of Volume volumedisk0
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : 7f5328cb-ed25-4627-9196-fb3e29e0e4ca
Status               : completed           

Status of volume: volumedisk1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick stor1data:/mnt/glusterfs/vol1/bri
ck1                                         49153     0          Y       13579
Brick stor2data:/mnt/glusterfs/vol1/bri
ck1                                         49153     0          Y       13344
Brick stor3data:/mnt/disk_c/glusterfs/v
ol1/brick1                                  49154     0          Y       17439
Brick stor3data:/mnt/disk_d/glusterfs/v
ol1/brick1                                  49155     0          Y       17459
NFS Server on localhost                     N/A       N/A        N       N/A  
NFS Server on stor3data                 N/A       N/A        N       N/A  
NFS Server on stor2data                 N/A       N/A        N       N/A  

Task Status of Volume volumedisk1
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : d0048704-beeb-4a6a-ae94-7e7916423fd3
Status               : completed 

2018-02-28 15:40 GMT+01:00 Nithya Balachandran <nbalacha@xxxxxxxxxx>:
Hi Jose,

On 28 February 2018 at 18:28, Jose V. Carrión <jocarbur@xxxxxxxxx> wrote:
Hi Nithya,
I applied the workarround for this bug and now df shows the right size:

That is good to hear.

[root@stor1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on/dev/sdb1              26T  1,1T   25T   4% /mnt/glusterfs/vol0
/dev/sdc1              50T   16T   34T  33% /mnt/glusterfs/vol1
stor1data:/volumedisk0
                      101T  3,3T   97T   4% /volumedisk0
stor1data:/volumedisk1
                      197T   61T  136T  31% /volumedisk1

[root@stor2 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on/dev/sdb1              26T  1,1T   25T   4% /mnt/glusterfs/vol0
/dev/sdc1              50T   16T   34T  33% /mnt/glusterfs/vol1
stor2data:/volumedisk0
                      101T  3,3T   97T   4% /volumedisk0
stor2data:/volumedisk1
                      197T   61T  136T  31% /volumedisk1

[root@stor3 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on/dev/sdb1              25T  638G   24T   3% /mnt/disk_b1/glusterfs/vol0
/dev/sdb2              25T  654G   24T   3% /mnt/disk_b2/glusterfs/vol0
/dev/sdc1              50T   15T   35T  30% /mnt/disk_c/glusterfs/vol1
/dev/sdd1              50T   15T   35T  30% /mnt/disk_d/glusterfs/vol1
stor3data:/volumedisk0
                      101T  3,3T   97T   4% /volumedisk0
stor3data:/volumedisk1
                      197T   61T  136T  31% /volumedisk1

However I'm concerned because, as you can see, the volumedisk0 on stor3data is composed by 2 bricks on thesame disk but on different partitions (/dev/sdb1 and /dev/sdb2).
After to aplly the workarround, the  shared-brick-count parameter was setted to 1 in all the bricks and all the servers (see below). Could be this an issue ?

No, this is correct. The shared-brick-count will be > 1 only if multiple bricks share the same partition.

Also, I can check that stor3data is now unbalanced respect stor1data and stor2data. The three nodes have the same size of brick but stor3data bricks have used 1TB less than stor1data and stor2data:

This does not necessarily indicate a problem. The distribution need not be exactly equal and depends on the filenames. Can you provide more information on the kind of dataset (how many files, sizes etc) on this volume? Did you create the volume with all 4 bricks or add some later?

Regards,
Nithya

stor1data:
/dev/sdb1              26T  1,1T   25T   4% /mnt/glusterfs/vol0
/dev/sdc1              50T   16T   34T  33% /mnt/glusterfs/vol1

stor2data bricks:
/dev/sdb1              26T  1,1T   25T   4% /mnt/glusterfs/vol0
/dev/sdc1              50T   16T   34T  33% /mnt/glusterfs/vol1

stor3data bricks:
  /dev/sdb1              25T  638G   24T   3% /mnt/disk_b1/glusterfs/vol0
  /dev/sdb2              25T  654G   24T   3% /mnt/disk_b2/glusterfs/vol0
   dev/sdc1              50T   15T   35T  30% /mnt/disk_c/glusterfs/vol1
   /dev/sdd1             50T   15T   35T  30% /mnt/disk_d/glusterfs/vol1

[root@stor1 ~]# grep -n "share" /var/lib/glusterd/vols/volumedisk1/*
/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor1data.mnt-glusterfs-vol1-brick1.vol:3:    option shared-brick-count 1
/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor1data.mnt-glusterfs-vol1-brick1.vol.rpmsave:3:    option shared-brick-count 1/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor2data.mnt-glusterfs-vol1-brick1.vol:3:    option shared-brick-count 1
/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor2data.mnt-glusterfs-vol1-brick1.vol.rpmsave:3:    option shared-brick-count 0/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor3data.mnt-disk_c-glusterfs-vol1-brick1.vol:3:    option shared-brick-count 1
/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor3data.mnt-disk_c-glusterfs-vol1-brick1.vol.rpmsave:3:    option shared-brick-count 0/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor3data.mnt-disk_d-glusterfs-vol1-brick1.vol:3:    option shared-brick-count 1
/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor3data.mnt-disk_d-glusterfs-vol1-brick1.vol.rpmsave:3:    option shared-brick-count 0

[root@stor2 ~]# grep -n "share" /var/lib/glusterd/vols/volumedisk1/*
/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor1data.mnt-glusterfs-vol1-brick1.vol:3:    option shared-brick-count 1/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor1data.mnt-glusterfs-vol1-brick1.vol.rpmsave:3:    option shared-brick-count 0
/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor2data.mnt-glusterfs-vol1-brick1.vol:3:    option shared-brick-count 1
/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor2data.mnt-glusterfs-vol1-brick1.vol.rpmsave:3:    option shared-brick-count 1
/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor3data.mnt-disk_c-glusterfs-vol1-brick1.vol:3:    option shared-brick-count 1
/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor3data.mnt-disk_c-glusterfs-vol1-brick1.vol.rpmsave:3:    option shared-brick-count 0/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor3data.mnt-disk_d-glusterfs-vol1-brick1.vol:3:    option shared-brick-count 1
/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor3data.mnt-disk_d-glusterfs-vol1-brick1.vol.rpmsave:3:    option shared-brick-count 0

[root@stor3t ~]# grep -n "share" /var/lib/glusterd/vols/volumedisk1/*
/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor1data.mnt-glusterfs-vol1-brick1.vol:3:    option shared-brick-count 1
/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor1data.mnt-glusterfs-vol1-brick1.vol.rpmsave:3:    option shared-brick-count 1/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor2data.mnt-glusterfs-vol1-brick1.vol:3:    option shared-brick-count 1
/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor2data.mnt-glusterfs-vol1-brick1.vol.rpmsave:3:    option shared-brick-count 0/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor3data.mnt-disk_c-glusterfs-vol1-brick1.vol:3:    option shared-brick-count 1
/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor3data.mnt-disk_c-glusterfs-vol1-brick1.vol.rpmsave:3:    option shared-brick-count 0/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor3data.mnt-disk_d-glusterfs-vol1-brick1.vol:3:    option shared-brick-count 1
/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor3data.mnt-disk_d-glusterfs-vol1-brick1.vol.rpmsave:3:    option shared-brick-count 0

Thaks for your help,
Greetings.

Jose V.

2018-02-28 5:07 GMT+01:00 Nithya Balachandran <nbalacha@xxxxxxxxxx>:
Hi Jose,
There is a known issue with gluster 3.12.x builds (see [1]) so you may be running into this.

The "shared-brick-count" values seem fine on stor1. Please send us "grep -n "share" /var/lib/glusterd/vols/volumedisk1/*" results for the other nodes so we can check if they are the cause.

Regards,
Nithya

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1517260

On 28 February 2018 at 03:03, Jose V. Carrión <jocarbur@xxxxxxxxx> wrote:
Hi, 

Some days ago all my glusterfs configuration was working fine. Today I 
realized that the total size reported by df command was changed and is 
smaller than the aggregated capacity of all the bricks in the volume.
I checked that all the volumes status are fine, all the glusterd 
daemons are running, there is no error in logs,  however df shows a bad 
total size.

My configuration for one volume: volumedisk1
[root@stor1 ~]# gluster volume status volumedisk1  detail

Status of volume: volumedisk1

------------------------------------------------------------------------------

Brick                : Brick stor1data:/mnt/glusterfs/vol1/brick1

TCP Port             : 49153               

RDMA Port            : 0                   

Online               : Y                   

Pid                  : 13579               

File System          : xfs                 

Device               : /dev/sdc1           

Mount Options        : rw,noatime          

Inode Size           : 512                 

Disk Space Free      : 35.0TB              

Total Disk Space     : 49.1TB              

Inode Count          : 5273970048          

Free Inodes          : 5273123069          

------------------------------------------------------------------------------

Brick                : Brick stor2data:/mnt/glusterfs/vol1/brick1

TCP Port             : 49153               

RDMA Port            : 0                   

Online               : Y                   

Pid                  : 13344               

File System          : xfs                 

Device               : /dev/sdc1           

Mount Options        : rw,noatime          

Inode Size           : 512                 

Disk Space Free      : 35.0TB              

Total Disk Space     : 49.1TB              

Inode Count          : 5273970048          

Free Inodes          : 5273124718          

------------------------------------------------------------------------------

Brick                : Brick stor3data:/mnt/disk_c/glusterfs/vol1/brick1

TCP Port             : 49154               

RDMA Port            : 0                   

Online               : Y                   

Pid                  : 17439               

File System          : xfs                 

Device               : /dev/sdc1           

Mount Options        : rw,noatime          

Inode Size           : 512                 

Disk Space Free      : 35.7TB              

Total Disk Space     : 49.1TB              

Inode Count          : 5273970048          

Free Inodes          : 5273125437          

------------------------------------------------------------------------------

Brick                : Brick stor3data:/mnt/disk_d/glusterfs/vol1/brick1

TCP Port             : 49155               

RDMA Port            : 0                   

Online               : Y                   

Pid                  : 17459               

File System          : xfs                 

Device               : /dev/sdd1           

Mount Options        : rw,noatime          

Inode Size           : 512                 

Disk Space Free      : 35.6TB              

Total Disk Space     : 49.1TB              

Inode Count          : 5273970048          

Free Inodes          : 5273127036          

Then full size for volumedisk1 should be: 49.1TB + 49.1TB + 49.1TB +49.1TB = 196,4 TB  but df shows:

[root@stor1 ~]# df -h

Filesystem            Size  Used Avail Use% Mounted on

/dev/sda2              48G   21G   25G  46% /

tmpfs                  32G   80K   32G   1% /dev/shm

/dev/sda1             190M   62M  119M  35% /boot

/dev/sda4             395G  251G  124G  68% /data

/dev/sdb1              26T  601G   25T   3% /mnt/glusterfs/vol0

/dev/sdc1              50T   15T   36T  29% /mnt/glusterfs/vol1

stor1data:/volumedisk0

                       76T  1,6T   74T   3% /volumedisk0

stor1data:/volumedisk1

                      148T   42T  106T  29% /volumedisk1
Exactly 1 brick minus: 196,4 TB - 49,1TB = 148TB

It's a production system so I hope you can help me.

Thanks in advance.
Jose V.

Below some other data of my configuration:

[root@stor1 ~]# gluster volume info

Volume Name: volumedisk0

Type: Distribute

Volume ID: 0ee52d94-1131-4061-bcef-bd8cf898da10

Status: Started

Snapshot Count: 0

Number of Bricks: 4

Transport-type: tcp

Bricks:

Brick1: stor1data:/mnt/glusterfs/vol0/brick1

Brick2: stor2data:/mnt/glusterfs/vol0/brick1

Brick3: stor3data:/mnt/disk_b1/glusterfs/vol0/brick1

Brick4: stor3data:/mnt/disk_b2/glusterfs/vol0/brick1

Options Reconfigured:

performance.cache-size: 4GB

cluster.min-free-disk: 1%

performance.io-thread-count: 16

performance.readdir-ahead: on

Volume Name: volumedisk1

Type: Distribute

Volume ID: 591b7098-800e-4954-82a9-6b6d81c9e0a2

Status: Started

Snapshot Count: 0

Number of Bricks: 4

Transport-type: tcp

Bricks:

Brick1: stor1data:/mnt/glusterfs/vol1/brick1

Brick2: stor2data:/mnt/glusterfs/vol1/brick1

Brick3: stor3data:/mnt/disk_c/glusterfs/vol1/brick1

Brick4: stor3data:/mnt/disk_d/glusterfs/vol1/brick1

Options Reconfigured:

cluster.min-free-inodes: 6%

performance.cache-size: 4GB

cluster.min-free-disk: 1%

performance.io-thread-count: 16

performance.readdir-ahead: on
[root@stor1 ~]# grep -n "share" /var/lib/glusterd/vols/volumedisk1/*

/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor1data.mnt-glusterfs-vol1-brick1.vol:3:    option shared-brick-count 1

/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor1data.mnt-glusterfs-vol1-brick1.vol.rpmsave:3:    option shared-brick-count 1

/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor2data.mnt-glusterfs-vol1-brick1.vol:3:    option shared-brick-count 0

/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor2data.mnt-glusterfs-vol1-brick1.vol.rpmsave:3:    option shared-brick-count 0

/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor3data.mnt-disk_c-glusterfs-vol1-brick1.vol:3:    option shared-brick-count 0

/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor3data.mnt-disk_c-glusterfs-vol1-brick1.vol.rpmsave:3:   
 option shared-brick-count 0

/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor3data.mnt-disk_d-glusterfs-vol1-brick1.vol:3:    option shared-brick-count 0

/var/lib/glusterd/vols/volumedisk1/volumedisk1.stor3data.mnt-disk_d-glusterfs-vol1-brick1.vol.rpmsave:3:   
 option shared-brick-count 0

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users