Re: Rebalance failure wrt trashcan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I tried to reproduce the situation using master by adding some bricks and initiating the rebalance operation(I created some empty files through mount before adding the bricks). And I couldn't find any error in volume status output or rebalance/brick logs.

[root@dhcp43-4 master]# gluster v create vol 10.70.43.4:/home/brick1 10.70.43.66:/home/brick2 force
volume create: vol: success: please start the volume to access data
[root@dhcp43-4 master]# gluster v start vol
volume start: vol: success
[root@dhcp43-4 master]# gluster v add-brick vol 10.70.43.66:/home/brick3 10.70.43.66:/home/brick4 force
volume add-brick: success
[root@dhcp43-4 master]# gluster v rebalance vol start
volume rebalance: vol: success: Rebalance on vol has been started successfully. Use rebalance status command to check status of the rebalance process.
ID: f4f86e5e-e042-424b-a155-687b88cd6d26

[root@dhcp43-4 master]# gluster v rebalance vol status
Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 5 0 1 completed 0.00 10.70.43.66 0 0Bytes 6 0 2 completed 0.00
volume rebalance: vol: success:
[root@dhcp43-4 master]# gluster v status vol
Status of volume: vol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.43.4:/home/brick1               49152     0          Y  6983
Brick 10.70.43.66:/home/brick2              49152     0          Y  12853
Brick 10.70.43.66:/home/brick3              49153     0          Y  12888
Brick 10.70.43.66:/home/brick4              49154     0          Y  12905
NFS Server on localhost                     2049      0          Y  7027
NFS Server on 10.70.43.66                   2049      0          Y  12923

Task Status of Volume vol
------------------------------------------------------------------------------
Task                 : Rebalance
ID                   : f4f86e5e-e042-424b-a155-687b88cd6d26
Status               : completed

However I could see the following in rebalance logs.

[2015-05-14 11:40:14.474644] I [dht-layout.c:697:dht_layout_normalize] 0-vol-dht: Found anomalies in /.trashcan (gfid = 00000000-0000-00
00-0000-000000000005). Holes=1 overlaps=0

[2015-05-14 11:40:14.485028] I [MSGID: 109036] [dht-common.c:6690:dht_log_new_layout_for_dir_selfheal] 0-vol-dht: Setting layout of /.trashcan with [Subvol_name: vol-client-0, Err: -1 , Start: 0 , Stop: 1073737911 , Hash: 1 ], [Subvol_name: vol-client-1, Err: -1 , Start: 1073737912 , Stop: 2147475823 , Hash: 1 ], [Subvol_name: vol-client-2, Err: -1 , Start: 2147475824 , Stop: 3221213735 , Hash: 1 ], [Subvol_name: vol-client-3, Err: -1 , Start: 3221213736 , Stop: 4294967295 , Hash: 1 ],

[2015-05-14 11:40:14.485958] I [dht-common.c:3539:dht_setxattr] 0-vol-dht: fixing the layout of /.trashcan

. . .

[2015-05-14 11:40:14.488222] I [dht-rebalance.c:2113:gf_defrag_process_dir] 0-vol-dht: migrate data called on /.trashcan

[2015-05-14 11:40:14.488966] I [dht-rebalance.c:2322:gf_defrag_process_dir] 0-vol-dht: Migration operation on dir /.trashcan took 0.00 secs

[2015-05-14 11:40:14.494033] I [dht-layout.c:697:dht_layout_normalize] 0-vol-dht: Found anomalies in /.trashcan/internal_op (gfid = 00000000-0000-0000-0000-000000000006). Holes=1 overlaps=0

[2015-05-14 11:40:14.495608] I [MSGID: 109036] [dht-common.c:6690:dht_log_new_layout_for_dir_selfheal] 0-vol-dht: Setting layout of /.trashcan/internal_op with [Subvol_name: vol-client-0, Err: -1 , Start: 2147475824 , Stop: 3221213735 , Hash: 1 ], [Subvol_name: vol-client-1, Err: -1 , Start: 3221213736 , Stop: 4294967295 , Hash: 1 ], [Subvol_name: vol-client-2, Err: -1 , Start: 0 , Stop: 1073737911 , Hash: 1 ], [Subvol_name: vol-client-3, Err: -1 , Start: 1073737912 , Stop: 2147475823 , Hash: 1 ],

[2015-05-14 11:40:14.501198] I [dht-common.c:3539:dht_setxattr] 0-vol-dht: fixing the layout of /.trashcan/internal_op

. . .

[2015-05-14 11:40:14.508264] I [dht-rebalance.c:2113:gf_defrag_process_dir] 0-vol-dht: migrate data called on /.trashcan/internal_op

[2015-05-14 11:40:14.509493] I [dht-rebalance.c:2322:gf_defrag_process_dir] 0-vol-dht: Migration operation on dir /.trashcan/internal_op took 0.00 secs

[2015-05-14 11:40:14.513020] I [dht-common.c:3539:dht_setxattr] 0-vol-dht: fixing the layout of /.trashcan/internal_op

[2015-05-14 11:40:14.525227] I [dht-common.c:3539:dht_setxattr] 0-vol-dht: fixing the layout of /.trashcan

. . .

[2015-05-14 11:40:14.529157] I [dht-rebalance.c:2793:gf_defrag_start_crawl] 0-DHT: crawling file-system completed


On 05/14/2015 04:20 PM, SATHEESARAN wrote:
On 05/14/2015 12:55 PM, Vijay Bellur wrote:
On 05/14/2015 09:00 AM, SATHEESARAN wrote:
Hi All,

I was using glusterfs-3.7 beta2 build (
glusterfs-3.7.0beta2-0.0.el6.x86_64 )
I have seen rebalance failure in one of the node.

[2015-05-14 12:17:03.695156] E
[dht-rebalance.c:2368:gf_defrag_settle_hash] 0-vmstore-dht: fix layout
on /.trashcan/internal_op failed
[2015-05-14 12:17:03.695636] E [MSGID: 109016]
[dht-rebalance.c:2528:gf_defrag_fix_layout] 0-vmstore-dht: Fix layout
failed for /.trashcan

Does it have any impact ?


I don't think there should be any impact due to this. rebalance should
continue fine without any problems. Do let us know if you observe the
behavior to be otherwise.

-Vijay
I tested the same functionally and I don't find any impact as such, but
the 'gluster volume status <vol-name>' reports the rebalance as a FAILURE.
Any tool ( for example oVirt ), consuming the output from 'gluster
volume status <vol> --xml'  would report the rebalance operation as FAILURE

[root@ ~]# gluster volume rebalance vmstore start
volume rebalance: vmstore: success: Rebalance on vmstore has been
started successfully. Use rebalance status command to check status of
the rebalance process.
ID: 68a12fc9-acd5-4f24-ba2d-bfc070ad5668

[root@~]# gluster volume rebalance vmstore status
                                     Node Rebalanced-files size
scanned      failures       skipped status   run time in secs
                                ---------      ----------- -----------
-----------   -----------   ----------- ------------     --------------
                                localhost                0
0Bytes             2             0             0 completed 0.00
                              10.70.37.58                0
0Bytes             0             3             0 failed               0.00
volume rebalance: vmstore: success:

[root@~]# gluster volume status vmstore
Status of volume: vmstore
Gluster process                             TCP Port  RDMA Port Online  Pid
------------------------------------------------------------------------------

......

Task Status of Volume vmstore
------------------------------------------------------------------------------

Task                 : Rebalance
ID                   : 68a12fc9-acd5-4f24-ba2d-bfc070ad5668
Status               : failed

Snip from --xml tasks :
<tasks>
           <task>
             <type>Rebalance</type>
<id>68a12fc9-acd5-4f24-ba2d-bfc070ad5668</id>
             <status>4</status>
             <statusStr>failed</statusStr>
           </task>
</tasks>

Even this is the case with remove-brick with data migration too

-- sas

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux