Re: Skipped files during rebalance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 08/17/2015 01:58 AM, Christophe TREFOIS wrote:

Dear all,

 

I have successfully added a new node to our setup, and finally managed to get a successful fix-layout run as well with no errors.

 

Now, as per the documentation, I started a gluster volume rebalance live start task and I see many skipped files. 

The error log contains then entires as follows for each skipped file.

 

[2015-08-16 20:23:30.591161] E [MSGID: 109023] [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file failed:/hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Mea

s_05(2013-10-11_17-12-02)/004010008.flex lookup failed

[2015-08-16 20:23:30.768391] E [MSGID: 109023] [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file failed:/hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Mea

s_05(2013-10-11_17-12-02)/007005003.flex lookup failed

[2015-08-16 20:23:30.804811] E [MSGID: 109023] [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file failed:/hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Mea

s_05(2013-10-11_17-12-02)/006005009.flex lookup failed

[2015-08-16 20:23:30.805201] E [MSGID: 109023] [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file failed:/hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Mea

s_05(2013-10-11_17-12-02)/005006011.flex lookup failed

[2015-08-16 20:23:30.880037] E [MSGID: 109023] [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file failed:/hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Mea

s_05(2013-10-11_17-12-02)/005009012.flex lookup failed

[2015-08-16 20:23:31.038236] E [MSGID: 109023] [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file failed:/hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Mea

s_05(2013-10-11_17-12-02)/003008007.flex lookup failed

[2015-08-16 20:23:31.259762] E [MSGID: 109023] [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file failed:/hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Mea

s_05(2013-10-11_17-12-02)/004008006.flex lookup failed

[2015-08-16 20:23:31.333764] E [MSGID: 109023] [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file failed:/hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Mea

s_05(2013-10-11_17-12-02)/007008001.flex lookup failed

[2015-08-16 20:23:31.340190] E [MSGID: 109023] [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file failed:/hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Mea

s_05(2013-10-11_17-12-02)/006007004.flex lookup failed

 

Update: one of the rebalance tasks now failed.

 

@Rafi, I got the same error as Friday except this time with data.


Packets that carrying the ping request could be waiting in the queue during the whole time-out period, because of the heavy traffic in the network. I have sent a patch for this. You can track the status here : http://review.gluster.org/11935


 

[2015-08-16 20:24:34.533167] C [rpc-clnt-ping.c:161:rpc_clnt_ping_timer_expired] 0-live-client-0: server 192.168.123.104:49164 has not responded in the last 42 seconds, disconnecting.

[2015-08-16 20:24:34.533614] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7fa454de59e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwin

d+0x1de)[0x7fa454bb09be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fa454bb0ace] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7fa454bb247c] (--> /lib64/li

bgfrpc.so.0(rpc_clnt_notify+0x48)[0x7fa454bb2c38] ))))) 0-live-client-0: forced unwinding frame type(GlusterFS 3.3) op(INODELK(29)) called at 2015-08-16 20:23:51.305640 (xid=0x5dd4da)

[2015-08-16 20:24:34.533672] E [MSGID: 114031] [client-rpc-fops.c:1621:client3_3_inodelk_cbk] 0-live-client-0: remote operation failed [Transport endpoint is not connected]

[2015-08-16 20:24:34.534201] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7fa454de59e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwin

d+0x1de)[0x7fa454bb09be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fa454bb0ace] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7fa454bb247c] (--> /lib64/li

bgfrpc.so.0(rpc_clnt_notify+0x48)[0x7fa454bb2c38] ))))) 0-live-client-0: forced unwinding frame type(GlusterFS 3.3) op(READ(12)) called at 2015-08-16 20:23:51.303938 (xid=0x5dd4d7)

[2015-08-16 20:24:34.534347] E [MSGID: 109023] [dht-rebalance.c:1124:dht_migrate_file] 0-live-dht: Migrate file failed: /hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Meas_

12(2013-10-12_00-12-55)/007008007.flex: failed to migrate data

[2015-08-16 20:24:34.534413] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7fa454de59e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwin

d+0x1de)[0x7fa454bb09be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fa454bb0ace] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7fa454bb247c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7fa454bb2c38] ))))) 0-live-client-0: forced unwinding frame type(GlusterFS 3.3) op(READ(12)) called at 2015-08-16 20:23:51.303969 (xid=0x5dd4d8)

[2015-08-16 20:24:34.534579] E [MSGID: 109023] [dht-rebalance.c:1124:dht_migrate_file] 0-live-dht: Migrate file failed: /hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Meas_12(2013-10-12_00-12-55)/007009012.flex: failed to migrate data

[2015-08-16 20:24:34.534676] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7fa454de59e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fa454bb09be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fa454bb0ace] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7fa454bb247c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7fa454bb2c38] ))))) 0-live-client-0: forced unwinding frame type(GlusterFS 3.3) op(READ(12)) called at 2015-08-16 20:23:51.313548 (xid=0x5dd4db)

[2015-08-16 20:24:34.534745] E [MSGID: 109023] [dht-rebalance.c:1124:dht_migrate_file] 0-live-dht: Migrate file failed: /hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Meas_12(2013-10-12_00-12-55)/006008011.flex: failed to migrate data

[2015-08-16 20:24:34.535199] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7fa454de59e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fa454bb09be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fa454bb0ace] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7fa454bb247c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7fa454bb2c38] ))))) 0-live-client-0: forced unwinding frame type(GlusterFS 3.3) op(READ(12)) called at 2015-08-16 20:23:51.326369 (xid=0x5dd4dc)

[2015-08-16 20:24:34.535232] E [MSGID: 109023] [dht-rebalance.c:1124:dht_migrate_file] 0-live-dht: Migrate file failed: /hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Meas_12(2013-10-12_00-12-55)/005003001.flex: failed to migrate data

[2015-08-16 20:24:34.535984] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7fa454de59e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fa454bb09be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fa454bb0ace] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7fa454bb247c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7fa454bb2c38] ))))) 0-live-client-0: forced unwinding frame type(GlusterFS 3.3) op(READ(12)) called at 2015-08-16 20:23:51.326437 (xid=0x5dd4dd)

[2015-08-16 20:24:34.536069] E [MSGID: 109023] [dht-rebalance.c:1124:dht_migrate_file] 0-live-dht: Migrate file failed: /hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Meas_12(2013-10-12_00-12-55)/007010012.flex: failed to migrate data

[2015-08-16 20:24:34.536267] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7fa454de59e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fa454bb09be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fa454bb0ace] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7fa454bb247c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7fa454bb2c38] ))))) 0-live-client-0: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-08-16 20:23:51.337240 (xid=0x5dd4de)

[2015-08-16 20:24:34.536339] E [MSGID: 109023] [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file failed:/hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Meas_08(2013-10-11_20-12-25)/002005012.flex lookup failed

[2015-08-16 20:24:34.536487] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7fa454de59e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fa454bb09be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fa454bb0ace] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7fa454bb247c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7fa454bb2c38] ))))) 0-live-client-0: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-08-16 20:23:51.425254 (xid=0x5dd4df)

[2015-08-16 20:24:34.536685] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7fa454de59e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fa454bb09be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fa454bb0ace] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7fa454bb247c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7fa454bb2c38] ))))) 0-live-client-0: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-08-16 20:23:51.738907 (xid=0x5dd4e0)

[2015-08-16 20:24:34.536891] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7fa454de59e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fa454bb09be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fa454bb0ace] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7fa454bb247c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7fa454bb2c38] ))))) 0-live-client-0: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-08-16 20:23:51.805096 (xid=0x5dd4e1)

[2015-08-16 20:24:34.537316] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7fa454de59e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fa454bb09be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fa454bb0ace] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7fa454bb247c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7fa454bb2c38] ))))) 0-live-client-0: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-08-16 20:23:51.805977 (xid=0x5dd4e2)

[2015-08-16 20:24:34.537735] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7fa454de59e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fa454bb09be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fa454bb0ace] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7fa454bb247c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7fa454bb2c38] ))))) 0-live-client-0: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-08-16 20:23:52.530107 (xid=0x5dd4e3)

[2015-08-16 20:24:34.538475] E [MSGID: 114031] [client-rpc-fops.c:1621:client3_3_inodelk_cbk] 0-live-client-0: remote operation failed [Transport endpoint is not connected]

The message "E [MSGID: 114031] [client-rpc-fops.c:1621:client3_3_inodelk_cbk] 0-live-client-0: remote operation failed [Transport endpoint is not connected]" repeated 4 times between [2015-08-16 20:24:34.538475] and [2015-08-16 20:24:34.538535]

[2015-08-16 20:24:34.538584] E [MSGID: 109023] [dht-rebalance.c:1617:gf_defrag_migrate_single_file] 0-live-dht: Migrate file failed: 002004003.flex lookup failed

[2015-08-16 20:24:34.538904] E [MSGID: 109023] [dht-rebalance.c:1617:gf_defrag_migrate_single_file] 0-live-dht: Migrate file failed: 003009008.flex lookup failed

[2015-08-16 20:24:34.539724] E [MSGID: 109023] [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file failed:/hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Meas_08(2013-10-11_20-12-25)/005009006.flex lookup failed

[2015-08-16 20:24:34.539820] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Meas_08(2013-10-11_20-12-25)

[2015-08-16 20:24:34.540031] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1

[2015-08-16 20:24:34.540691] E [MSGID: 114031] [client-rpc-fops.c:251:client3_3_mknod_cbk] 0-live-client-0: remote operation failed. Path: /hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Meas_12(2013-10-12_00-12-55)/002005008.flex [Transport endpoint is not connected]

[2015-08-16 20:24:34.541152] E [MSGID: 114031] [client-rpc-fops.c:251:client3_3_mknod_cbk] 0-live-client-0: remote operation failed. Path: /hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Meas_12(2013-10-12_00-12-55)/005004009.flex [Transport endpoint is not connected]

[2015-08-16 20:24:34.541331] E [MSGID: 114031] [client-rpc-fops.c:251:client3_3_mknod_cbk] 0-live-client-0: remote operation failed. Path: /hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Meas_12(2013-10-12_00-12-55)/007005011.flex [Transport endpoint is not connected]

[2015-08-16 20:24:34.541486] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs/OperaArchiveCol

[2015-08-16 20:24:34.541572] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs

[2015-08-16 20:24:34.541639] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs

 

Any help would be greatly appreciated.

CCing dht teams to give you better idea about why rebalance failed/ and about huge memory consumption by rebalance process (200GB RAM) .

Regards
Rafi KC



 

Thanks,

 

--

Christophe

Dr Christophe Trefois, Dipl.-Ing.  
Technical Specialist / Post-Doc

UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
Campus Belval | House of Biomedicine  
6, avenue du Swing 
L-4367 Belvaux  

T: +352 46 66 44 6124 
F: +352 46 66 44 6949  
http://www.uni.lu/lcsb

----
This message is confidential and may contain privileged information. 
It is intended for the named recipient only. 
If you receive it in error please notify me and permanently delete the original message and any copies. 
----

  

 


_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux