remove_me files building up

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

We're running a cluster with two data nodes and one arbiter, and have sharding enabled.

We had an issue a while back where one of the server's crashed, we got the server back up and running and ensured that all healing entries cleared, and also increased the server spec (CPU/Mem) as this seemed to be the potential cause.

Since then however, we've seen some strange behaviour, whereby a lot of 'remove_me' files are building up under `/data/glusterfs/gv1/brick2/brick/.shard/.remove_me/` and `/data/glusterfs/gv1/brick3/brick/.shard/.remove_me/`. This is causing the arbiter to run out of space on brick2 and brick3, as the remove_me files are constantly increasing.

brick1 appears to be fine, the disk usage increases throughout the day and drops down in line with the trend of the brick on the data nodes. We see the disk usage increase and drop throughout the day on the data nodes for brick2 and brick3 as well, but while the arbiter follows the same trend of the disk usage increasing, it doesn't drop at any point.

This is the output of some gluster commands, occasional heal entries come and go:
root@uk3-prod-gfs-arb-01:~# gluster volume info gv1

Volume Name: gv1
Type: Distributed-Replicate
Volume ID: d3d1fdec-7df9-4f71-b9fc-660d12c2a046
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: uk1-prod-gfs-01:/data/glusterfs/gv1/brick1/brick
Brick2: uk2-prod-gfs-01:/data/glusterfs/gv1/brick1/brick
Brick3: uk3-prod-gfs-arb-01:/data/glusterfs/gv1/brick1/brick (arbiter)
Brick4: uk1-prod-gfs-01:/data/glusterfs/gv1/brick3/brick
Brick5: uk2-prod-gfs-01:/data/glusterfs/gv1/brick3/brick
Brick6: uk3-prod-gfs-arb-01:/data/glusterfs/gv1/brick3/brick (arbiter)
Brick7: uk1-prod-gfs-01:/data/glusterfs/gv1/brick2/brick
Brick8: uk2-prod-gfs-01:/data/glusterfs/gv1/brick2/brick
Brick9: uk3-prod-gfs-arb-01:/data/glusterfs/gv1/brick2/brick (arbiter)
Options Reconfigured:
cluster.entry-self-heal: on
cluster.metadata-self-heal: on
cluster.data-self-heal: on
performance.client-io-threads: off
storage.fips-mode-rchecksum: on
transport.address-family: inet
cluster.lookup-optimize: off
performance.readdir-ahead: off
cluster.readdir-optimize: off
cluster.self-heal-daemon: enable
features.shard: enable
features.shard-block-size: 512MB
cluster.min-free-disk: 10%
cluster.use-anonymous-inode: yes

root@uk3-prod-gfs-arb-01:~# gluster peer status
Number of Peers: 2

Hostname: uk2-prod-gfs-01
Uuid: 2fdfa4a2-195d-4cc5-937c-f48466e76149
State: Peer in Cluster (Connected)

Hostname: uk1-prod-gfs-01
Uuid: 43ec93d1-ad83-4103-aea3-80ded0903d88
State: Peer in Cluster (Connected)

root@uk3-prod-gfs-arb-01:~# gluster volume heal gv1 info
Brick uk1-prod-gfs-01:/data/glusterfs/gv1/brick1/brick
<gfid:5b57e1f6-3e3d-4334-a0db-b2560adae6d1>
Status: Connected
Number of entries: 1

Brick uk2-prod-gfs-01:/data/glusterfs/gv1/brick1/brick
Status: Connected
Number of entries: 0

Brick uk3-prod-gfs-arb-01:/data/glusterfs/gv1/brick1/brick
Status: Connected
Number of entries: 0

Brick uk1-prod-gfs-01:/data/glusterfs/gv1/brick3/brick
Status: Connected
Number of entries: 0

Brick uk2-prod-gfs-01:/data/glusterfs/gv1/brick3/brick
Status: Connected
Number of entries: 0

Brick uk3-prod-gfs-arb-01:/data/glusterfs/gv1/brick3/brick
Status: Connected
Number of entries: 0

Brick uk1-prod-gfs-01:/data/glusterfs/gv1/brick2/brick
Status: Connected
Number of entries: 0

Brick uk2-prod-gfs-01:/data/glusterfs/gv1/brick2/brick
<gfid:6ba9c472-9232-4b45-b12f-a1232d6f4627>
/.shard/.remove_me
<gfid:0f042518-248d-426a-93f4-cfaa92b6ef3e>
Status: Connected
Number of entries: 3

Brick uk3-prod-gfs-arb-01:/data/glusterfs/gv1/brick2/brick
<gfid:6ba9c472-9232-4b45-b12f-a1232d6f4627>
/.shard/.remove_me
<gfid:0f042518-248d-426a-93f4-cfaa92b6ef3e>
Status: Connected
Number of entries: 3

root@uk3-prod-gfs-arb-01:~# gluster volume get all cluster.op-version
Option                                   Value
------                                   -----
cluster.op-version                       100000

We're not sure if this is a potential bug or if something's corrupted that we don't have visibility of, so any pointers/suggestions about how to approach this would be appreciated. 

Thanks,
Liam




The contents of this email message and any attachments are intended solely for the addressee(s) and may contain confidential and/or privileged information and may be legally protected from disclosure.



________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux