RSYNC files renaming issue and timeout errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hi All,

our users are experiencing the following two problems using our gluster storage based on a 12x(4+2) distributed dispersed volume:

1) sometimes, during RSYNC copies executions, they noticed this error message:

rsync: rename "/tier2/CSP/sp1/ftp/lweprsn/.cmcc_CMCC-CM2-v20160423_hindcast_S2005040100_atmos_day_surface_lweprsn_n1-n40.sha256.rD38kX" -> "cmcc_CMCC-CM2-v20160423_hindcast_S2005040100_atmos_day_surface_lweprsn_n1-n40.sha256": Permission denied (13)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6]

The files seem to be correctly copied but rsync returns this kind of error and stop the sync process.

2) if the number of the total rsync sessions against the gluster volume grows, it seems that gluster volume starts suffering some particular workload, directories tree navigation becomes very slow and a lot of timeout errors appear on 192.168.0.54 gluster server (named “s04")

I just checked switch, cables, ports and so on, but everything seems to be ok.
I also executed a lot of checks using iperf tool and the network seems to be ok.
I dont’ understand why I see only timeout errors related to s04 host (192.168.0.54). It is up and running…

It seems that rsync copy processes started by the gluster volume users cause some noise (on gluster client log file) during temporary files creation.

[2019-01-27 02:36:53.488942] E [socket.c:2376:socket_connect_finish] 0-tier2-client-57: connection to 192.168.0.54:49159 failed (Timeout della connessione); disconnecting socke
t
[2019-01-27 02:53:48.607832] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved
_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d
410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-57: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-01-2
7 02:53:10.568717 (xid=0x4aeefc8)
[2019-01-27 02:55:01.631912] E [socket.c:2376:socket_connect_finish] 0-tier2-client-54: connection to 192.168.0.54:49158 failed (Timeout della connessione); disconnecting socke
t
[2019-01-27 02:56:05.643880] E [socket.c:2376:socket_connect_finish] 0-tier2-client-54: connection to 192.168.0.54:49158 failed (Timeout della connessione); disconnecting socke
t
[2019-01-27 02:57:09.653961] E [socket.c:2376:socket_connect_finish] 0-tier2-client-54: connection to 192.168.0.54:49158 failed (Timeout della connessione); disconnecting socke
t

[2019-01-27 03:50:56.951088] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved
_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d
410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-54: forced unwinding frame type(GlusterFS 3.3) op(INODELK(29)) called at 2019-01
-27 03:39:48.938568 (xid=0x49e54c1)
[2019-01-27 03:50:56.951111] E [MSGID: 114031] [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-54: remote operation failed [Transport endpoint is not connected]
[2019-01-27 03:50:56.951471] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved
_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d
410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-54: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-01-2
7 03:35:31.328387 (xid=0x49e54bf)
[2019-01-27 03:50:56.958357] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved
_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d
410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-54: forced unwinding frame type(GlusterFS 3.3) op(IPC(47)) called at 2019-01-27 
03:37:49.652707 (xid=0x49e54c0)
[2019-01-27 03:50:56.966845] E [MSGID: 114031] [client-rpc-fops.c:435:client3_3_open_cbk] 0-tier2-client-57: remote operation failed. Path: /CSP/sp1/SPS3/sps_201205/.rh2m_6hour
ly_sps_201205_017.2012-06.nc.gz.DHNucn (5d0575fb-21c3-45b0-97bf-46d13f3c9262) [Il file esiste]
[2019-01-27 03:50:56.967086] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4)
[2019-01-27 03:50:56.967224] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4)
[2019-01-27 03:50:56.992129] E [MSGID: 114031] [client-rpc-fops.c:435:client3_3_open_cbk] 0-tier2-client-57: remote operation failed. Path: /CSP/sp1/SPS3/sps_201205/.rh2m_6hour
ly_sps_201205_017.2012-06.nc.gz.DHNucn (5d0575fb-21c3-45b0-97bf-46d13f3c9262) [Il file esiste]
[2019-01-27 03:50:56.995954] E [MSGID: 114031] [client-rpc-fops.c:435:client3_3_open_cbk] 0-tier2-client-57: remote operation failed. Path: /CSP/sp1/SPS3/sps_201205/.rh2m_6hour
ly_sps_201205_017.2012-06.nc.gz.DHNucn (5d0575fb-21c3-45b0-97bf-46d13f3c9262) [Il file esiste]
[2019-01-27 03:50:56.996126] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4)
[2019-01-27 03:50:56.996751] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4)
[2019-01-27 03:50:57.001301] E [MSGID: 114031] [client-rpc-fops.c:435:client3_3_open_cbk] 0-tier2-client-57: remote operation failed. Path: /CSP/sp1/SPS3/sps_201205/.rh2m_6hour
ly_sps_201205_017.2012-06.nc.gz.DHNucn (5d0575fb-21c3-45b0-97bf-46d13f3c9262) [Il file esiste]
[2019-01-27 03:50:57.006375] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4)
[2019-01-27 04:02:09.269929] E [socket.c:2376:socket_connect_finish] 0-tier2-client-57: connection to 192.168.0.54:49159 failed (Timeout della connessione); disconnecting socke
t

I focused my attention on the errors occurred on gluster client during 25th of January 2019.
So, I collected the following log files:

- tier2.old.log.gz, (the client log file covering the period)
- tier2.today.log.gz (the updated client log file reporting similar error related to “renaming” and “timeout” issues)
- memory_usage.pdf, (weekly statistics for memory usage on 192.168.0.54 host from NAGIOS)
- network_usage.pdf, (weekly statistics for network usage on 192.168.0.54 host from NAGIOS)
- cpu_usage.pdf, (weekly statistics for network usage on 192.168.0.54 host from NAGIOS)
- gluster-mnt6-brick.log-20190127, brick 6 (on host 192.168.0.54) log file
- gluster-mnt7-brick.log-20190127, brick 7 (on host 192.168.0.54) log file

I detected the involved bricks greping on “gluster volume status” command output:

[root@s04 bricks]# gluster volume status|grep 49158
Brick s02-stg:/gluster/mnt6/brick           49158     0          Y       4147 
Brick s03-stg:/gluster/mnt6/brick           49158     0          Y       4272 
Brick s01-stg:/gluster/mnt7/brick           49158     0          Y       3324 
Brick s04-stg:/gluster/mnt7/brick           49158     0          Y       3787 
Brick s05-stg:/gluster/mnt7/brick           49158     0          Y       3131 
Brick s06-stg:/gluster/mnt7/brick           49158     0          Y       3254 

On gluster server s04 (192.168.0.54), in messages log file, I can see a lot of errors like the following one and nothig more

Jan 25 14:16:22 s04 kernel: traps: check_vol_utili[142140] general protection ip:7f8fcec7866d sp:7ffee3a86a30 error:0 in libglusterfs.so.0.0.1[7f8fcec25000+f7000]
Jan 25 14:16:22 s04 abrt-hook-ccpp: Process 142140 (python2.7) of user 0 killed by SIGSEGV - dumping core
Jan 25 14:16:22 s04 abrt-server: Generating core_backtrace
Jan 25 14:16:22 s04 abrt-server: Error: Unable to open './coredump': No such file or directory
Jan 25 14:17:00 s04 abrt-server: Duplicate: UUID
Jan 25 14:17:00 s04 abrt-server: DUP_OF_DIR: /var/tmp/abrt/ccpp-2018-09-25-11:18:20-4471
Jan 25 14:17:00 s04 abrt-server: Deleting problem directory ccpp-2019-01-25-14:16:22-142140 (dup of ccpp-2018-09-25-11:18:20-4471)
Jan 25 14:17:00 s04 dbus[1877]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Jan 25 14:17:00 s04 dbus[1877]: [system] Successfully activated service 'org.freedesktop.problems'
Jan 25 14:17:00 s04 abrt-server: Generating core_backtrace
Jan 25 14:17:00 s04 abrt-server: Error: Unable to open './coredump': No such file or directory
Jan 25 14:17:00 s04 abrt-server: Cannot notify '/var/tmp/abrt/ccpp-2018-09-25-11:18:20-4471' via uReport: Event 'report_uReport' exited with 1
Jan 25 14:17:01 s04 systemd: Created slice User Slice of root.
Jan 25 14:17:01 s04 systemd: Starting User Slice of root.
Jan 25 14:17:01 s04 systemd: Started Session 140059 of user root.
Jan 25 14:17:01 s04 systemd: Starting Session 140059 of user root.
Jan 25 14:17:01 s04 systemd: Removed slice User Slice of root.
Jan 25 14:17:01 s04 systemd: Stopping User Slice of root.
Jan 25 14:18:01 s04 systemd: Created slice User Slice of root.
Jan 25 14:18:01 s04 systemd: Starting User Slice of root.
Jan 25 14:18:01 s04 systemd: Started Session 140060 of user root.
Jan 25 14:18:01 s04 systemd: Starting Session 140060 of user root.
Jan 25 14:18:01 s04 systemd: Removed slice User Slice of root.
Jan 25 14:18:01 s04 systemd: Stopping User Slice of root.
Jan 25 14:18:35 s04 kernel: traps: check_vol_utili[142501] general protection ip:7f119fd7066d sp:7ffc399aa6f0 error:0 in libglusterfs.so.0.0.1[7f119fd1d000+f7000]
Jan 25 14:18:35 s04 abrt-hook-ccpp: Process 142501 (python2.7) of user 0 killed by SIGSEGV - dumping core
Jan 25 14:18:35 s04 abrt-server: Generating core_backtrace
Jan 25 14:18:35 s04 abrt-server: Error: Unable to open './coredump': No such file or directory

No errors in /var/log/messages log file related to the client machine.

Anyone of us could help me to solve this issue?
Thank you very much in advance,
Mauro

Attachment: tier2.old.log.gz
Description: GNU Zip compressed data


Attachment: tier2.today.log.gz
Description: GNU Zip compressed data


Attachment: cpu_usage.pdf
Description: Adobe PDF document


Attachment: memory_usage.pdf
Description: Adobe PDF document


Attachment: network_usage.pdf
Description: Adobe PDF document


Attachment: gluster-mnt6-brick.log-20190127.gz
Description: GNU Zip compressed data


Attachment: gluster-mnt7-brick.log-20190127.gz
Description: GNU Zip compressed data






_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux