Re: [Gluster-users] Fwd: dht_is_subvol_filled messages on client

Xavier Hernandez <xhernandez@xxxxxxxxxx> · Thu, 5 May 2016 14:16:25 +0200

On 05/05/16 13:59, Kaushal M wrote:
On Thu, May 5, 2016 at 4:37 PM, Xavier Hernandez <xhernandez@xxxxxxxxxx> wrote:
On 05/05/16 11:31, Kaushal M wrote:

On Thu, May 5, 2016 at 2:36 PM, David Gossage
<dgossage@xxxxxxxxxxxxxxxxxx> wrote:

On Thu, May 5, 2016 at 3:28 AM, Serkan Çoban <cobanserkan@xxxxxxxxx>
wrote:

Hi,

You can find the output below link:
https://www.dropbox.com/s/wzrh5yp494ogksc/status_detail.txt?dl=0

Thanks,
Serkan

Maybe not issue, but playing one of these things is not like the other I
notice of all the bricks only one seems to be different at a quick glance

Brick                : Brick 1.1.1.235:/bricks/20
TCP Port             : 49170
RDMA Port            : 0
Online               : Y
Pid                  : 26736
File System          : ext4
Device               : /dev/mapper/vol0-vol_root
Mount Options        : rw,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 86.1GB
Total Disk Space     : 96.0GB
Inode Count          : 6406144
Free Inodes          : 6381374

Every other brick seems to be 7TB and xfs but this one.

Looks like the brick fs isn't mounted, and the root-fs is being used
instead. But that still leaves enough inodes free.

What I suspect is that one of the cluster translators is mixing up
stats when aggregating from multiple bricks.
From the log snippet you gave in the first mail, it seems like the
disperse translator is possibly involved.

Currently ec takes the number of potential files in the subvolume (f_files)
as the maximum of all its subvolumes, but it takes the available count
(f_ffree) as the minumum of all its volumes.

This causes max to be ~781.000.000, but free will be ~6.300.000. This gives
a ~0.8% available, i.e. almost 100% full.

Given the circumstances I think it's the correct thing to do.

Thanks for giving the reasoning Xavi.

But why is the number of potential files the maximum?
IIUC, a file (or parts of it) will be written to all subvolumes in the
disperse set.
So wouldn't the smallest subvolume limit the number of files that
could be possibly created?

I'm not very sure why this decision was taken. In theory ec only 
supports identical subvolumes because of the way it works. This means 
that all bricks should report the same maximum.

When this doesn't happen, I suppose that the motivation was that this 
number should report the theoretic maximum number of files that the 
volume can contain.

~kaushal

Xavi

BTW, how large is the volume you have? Those are a lot of bricks!

~kaushal

On Thu, May 5, 2016 at 9:33 AM, Xavier Hernandez <xhernandez@xxxxxxxxxx>
wrote:

Can you post the result of 'gluster volume status v0 detail' ?

On 05/05/16 06:49, Serkan Çoban wrote:

Hi, Can anyone suggest something for this issue? df, du has no issue
for the bricks yet one subvolume not being used by gluster..

On Wed, May 4, 2016 at 4:40 PM, Serkan Çoban <cobanserkan@xxxxxxxxx>
wrote:

Hi,

I changed cluster.min-free-inodes to "0". Remount the volume on
clients. inode full messages not coming to syslog anymore but I see
disperse-56 subvolume still not being used.
Anything I can do to resolve this issue? Maybe I can destroy and
recreate the volume but I am not sure It will fix this issue...
Maybe the disperse size 16+4 is too big should I change it to 8+2?

On Tue, May 3, 2016 at 2:36 PM, Serkan Çoban <cobanserkan@xxxxxxxxx>
wrote:

I also checked the df output all 20 bricks are same like below:
/dev/sdu1 7.3T 34M 7.3T 1% /bricks/20

On Tue, May 3, 2016 at 1:40 PM, Raghavendra G
<raghavendra@xxxxxxxxxxx>
wrote:

On Mon, May 2, 2016 at 11:41 AM, Serkan Çoban
<cobanserkan@xxxxxxxxx>
wrote:

1. What is the out put of du -hs <back-end-export>? Please get
this
information for each of the brick that are part of disperse.

Sorry. I needed df output of the filesystem containing brick. Not
du.
Sorry
about that.

There are 20 bricks in disperse-56 and the du -hs output is like:
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
1.8M /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20

I see that gluster is not writing to this disperse set. All other
disperse sets are filled 13GB but this one is empty. I see
directory
structure created but no files in directories.
How can I fix the issue? I will try to rebalance but I don't think
it
will write to this disperse set...

On Sat, Apr 30, 2016 at 9:22 AM, Raghavendra G
<raghavendra@xxxxxxxxxxx>
wrote:

On Fri, Apr 29, 2016 at 12:32 AM, Serkan Çoban
<cobanserkan@xxxxxxxxx>
wrote:

Hi, I cannot get an answer from user list, so asking to devel
list.

I am getting [dht-diskusage.c:277:dht_is_subvol_filled]
0-v0-dht:
inodes on subvolume 'v0-disperse-56' are at (100.00 %), consider
adding more bricks.

message on client logs.My cluster is empty there are only a
couple
of
GB files for testing. Why this message appear in syslog?

dht uses disk usage information from backend export.

1. What is the out put of du -hs <back-end-export>? Please get
this
information for each of the brick that are part of disperse.
2. Once you get du information from each brick, the value seen by
dht
will
be based on how cluster/disperse aggregates du info (basically
statfs
fop).

The reason for 100% disk usage may be,
In case of 1, backend fs might be shared by data other than
brick.
In case of 2, some issues with aggregation.

Is is safe to
ignore it?

dht will try not to have data files on the subvol in question
(v0-disperse-56). Hence lookup cost will be two hops for files
hashing
to
disperse-56 (note that other fops like read/write/open still have
the
cost
of single hop and dont suffer from this penalty). Other than that
there
is
no significant harm unless disperse-56 is really running out of
space.

regards,
Raghavendra

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

--
Raghavendra G

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

--
Raghavendra G

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel