Re: Wrong volume size for distributed dispersed volume on 4.1.5

Nithya Balachandran <nbalacha@xxxxxxxxxx> · Tue, 16 Oct 2018 19:43:30 +0530

Hi,

On 16 October 2018 at 18:20,  <jring@xxxxxxx> wrote:
Hi everybody,

I have created a distributed dispersed volume on 4.1.5 under centos7 like this a few days ago:

gluster volume create data_vol1 disperse-data 4 redundancy 2 transport tcp \

\

gf-p-d-01.isec.foobar.com:/bricks/brick1/brick \

gf-p-d-03.isec.foobar.com:/bricks/brick1/brick \

gf-p-d-04.isec.foobar.com:/bricks/brick1/brick \

gf-p-k-01.isec.foobar.com:/bricks/brick1/brick \

gf-p-k-03.isec.foobar.com:/bricks/brick1/brick \

gf-p-k-04.isec.foobar.com:/bricks/brick1/brick \

\

gf-p-d-01.isec.foobar.com:/bricks/brick2/brick \

gf-p-d-03.isec.foobar.com:/bricks/brick2/brick \

gf-p-d-04.isec.foobar.com:/bricks/brick2/brick \

gf-p-k-01.isec.foobar.com:/bricks/brick2/brick \

gf-p-k-03.isec.foobar.com:/bricks/brick2/brick \

gf-p-k-04.isec.foobar.com:/bricks/brick2/brick \

\

... same for brick3 to brick9...

\

gf-p-d-01.isec.foobar.com:/bricks/brick10/brick \

gf-p-d-03.isec.foobar.com:/bricks/brick10/brick \

gf-p-d-04.isec.foobar.com:/bricks/brick10/brick \

gf-p-k-01.isec.foobar.com:/bricks/brick10/brick \

gf-p-k-03.isec.foobar.com:/bricks/brick10/brick \

gf-p-k-04.isec.foobar.com:/bricks/brick10/brick

This worked nicely and resulted in the following filesystem:

[root@gf-p-d-01 ~]# df -h /data/

Dateisystem Größe Benutzt Verf. Verw% Eingehängt auf

gf-p-d-01.isec.foobar.com:/data_vol1 219T 2,2T 217T 2% /data

Each of the bricks resides on its own 6TB disk with 1 big partition formated with xfs.

Yesterday a colleague looked at the filesystem and found some space missing...

[root@gf-p-d-01 ~]# df -h /data/

Filesystem Size Used Avail Use% Mounted on

gf-p-d-01.isec.foobar.com:/data_vol1 22T 272G 22T 2% /data

Some googling brought the following bug report against 3.4 which looks familiar:

https://bugzilla.redhat.com/show_bug.cgi?id=1541830

So we did a quick grep shared-brick-count /var/lib/glusterd/vols/data_vol1/* on all boxes and found that on 5 out of 6 boxes this was shared-brick-count=0 for all bricks on remote boxes and 1 for local bricks. 

Is this the expected result or should we have all 1 everywhere (as the quick fix script from the case sets it)?

No , this is fine. The shared-brick-count only needs to be 1 for the local bricks. The value for the remote bricks can be 0.

Also on one box (the one where we created the volume from, btw) we have shared-brick-count=0 for all remote bricks and 10 for the local bricks.

This is a problem. The shared-brick-count should be 1 for the local bricks here as well.

Is it possible that the bug from 3.4 still exists in 4.1.5 and should we try the filter script which sets shared-brick-count=1 for all bricks?

Can you try 
1. restarting glusterd on all the nodes one after another (not at the same time)
2. Setting a volume option (say gluster volume set <volname> cluster.min-free-disk 11%) 

and see if it fixes the issue?

Regards,
Nithya

The volume is not currently in production so now would be the time to play around and find the problem...

TIA and regards,

Joachim

-------------------------------------------------------------------------------------------------

FreeMail powered by mail.de - MEHR SICHERHEIT, SERIOSITÄT UND KOMFORT

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

https://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users