Re: Odd "Transport endpoint is not connected" when trying to gunzip a file

Strahil Nikolov <hunter86_bg@xxxxxxxxx> · Sat, 25 Jun 2022 01:42:01 +0000 (UTC)

To be honest, I have no clue.
I would try restarting the gluster brick process (even stop + start the volume if that's an option) and reboot of the client.

If that doesn't help , you will have to plan updating your TSP to something newer (v9 or even v10).

Best Regards,
Strahil Nikolov 

   On Wed, Jun 22, 2022 at 0:49, Pat Haley
<phaley@xxxxxxx> wrote:

    Hi Strahil
    I have tried a couple of tests of trying to gunzip the file with
      top running on the client (mseas) and on the brick server
      (mseas-data3) and with iotop running on the client (mseas).  I was
      not able to install iotop on the brick server yet (the external
      line is down).  I'll repeat when I fix that problem
    I now can get one of two error messages when gunzip fails:
    gzip:
        /projects/dri_calypso/PE/2019/Apr09/Ens3R200deg001/pe_out.nc.gz:
        File descriptor in bad state
a new error message

gzip:
        /projects/dri_calypso/PE/2019/Apr09/Ens3R200deg001/pe_out.nc.gz:
        Transport endpoint is not connected
the original error message

    What I observed while waiting for gunzip to fail
    top
no significant load (usually less than 0.1) on both
          machines.
zero IO-wait on both machines
iotop (only running on the client)
nothing related to gluster showing up in the display at all
    I include below what I found in the log files again corresponding
      to these tests (and what I see in dmesg on the brick-server
      related to gluster, nothing showed up on the client)
    Please let me know what I should try next.
    Thanks
    Pat

        ------------------------------------------

        mseas-data3: dmesg | grep glust

        ------------------------------------------

        many repeats of the following pairs of lines:

        glusterfsd: page allocation failure. order:1, mode:0x20

        Pid: 14245, comm: glusterfsd Not tainted
        2.6.32-754.2.1.el6.x86_64 #1

        ------------------------------------------

        mseas:messages

        ------------------------------------------

        Jun 21 17:04:35 mseas gdata[155485]: [2022-06-21
        21:04:35.638810] C
        [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired]
        0-data-volume-client-2: server 172.16.1.113:49153 has not
        responded in the last 42 seconds, disconnecting.

        Jun 21 17:21:04 mseas gdata[155485]: [2022-06-21
        21:21:04.786083] C
        [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired]
        0-data-volume-client-2: server 172.16.1.113:49153 has not
        responded in the last 42 seconds, disconnecting.

        ------------------------------------------

        mseas:gdata.log

        ------------------------------------------

        [2022-06-21 21:04:35.638810] C
        [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired]
        0-data-volume-client-2: server 172.16.1.113:49153 has not
        responded in the last 42 seconds, disconnecting.

        [2022-06-21 21:04:35.639261] E
        [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x172)[0x7f84886a0202]
        (-->
        /usr/local/lib/libgfrpc.so.0(saved_frames_unwind+0x1c2)[0x7f848846c3e2]
        (-->
        /usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f848846c4de]
        (-->
/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f848846dd2a]
        (-->
        /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f848846e538]
        ))))) 0-data-volume-client-2: forced unwinding frame
        type(GlusterFS 3.3) op(READ(12)) called at 2022-06-21
        21:03:29.735807 (xid=0xc05d54)

        [2022-06-21 21:04:35.639494] E
        [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x172)[0x7f84886a0202]
        (-->
        /usr/local/lib/libgfrpc.so.0(saved_frames_unwind+0x1c2)[0x7f848846c3e2]
        (-->
        /usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f848846c4de]
        (-->
/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f848846dd2a]
        (-->
        /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f848846e538]
        ))))) 0-data-volume-client-2: forced unwinding frame
        type(GF-DUMP) op(NULL(2)) called at 2022-06-21 21:03:53.633472
        (xid=0xc05d55)

        [2022-06-21 21:21:04.786083] C
        [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired]
        0-data-volume-client-2: server 172.16.1.113:49153 has not
        responded in the last 42 seconds, disconnecting.

        [2022-06-21 21:21:04.786732] E
        [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x172)[0x7f84886a0202]
        (-->
        /usr/local/lib/libgfrpc.so.0(saved_frames_unwind+0x1c2)[0x7f848846c3e2]
        (-->
        /usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f848846c4de]
        (-->
/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f848846dd2a]
        (-->
        /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f848846e538]
        ))))) 0-data-volume-client-2: forced unwinding frame
        type(GlusterFS 3.3) op(READ(12)) called at 2022-06-21
        21:19:52.634383 (xid=0xc05e31)

        [2022-06-21 21:21:04.787172] E
        [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x172)[0x7f84886a0202]
        (-->
        /usr/local/lib/libgfrpc.so.0(saved_frames_unwind+0x1c2)[0x7f848846c3e2]
        (-->
        /usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f848846c4de]
        (-->
/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f848846dd2a]
        (-->
        /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f848846e538]
        ))))) 0-data-volume-client-2: forced unwinding frame
        type(GF-DUMP) op(NULL(2)) called at 2022-06-21 21:20:22.780023
        (xid=0xc05e32)

        ------------------------------------------

        mseas-data3: bricks/export-sda-brick3.log

        ------------------------------------------

        [2022-06-21 21:03:54.489638] I [MSGID: 115036]
        [server.c:552:server_rpc_notify] 0-data-volume-server:
        disconnecting connection from
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-31

        [2022-06-21 21:03:54.489752] I [MSGID: 115013]
        [server-helpers.c:294:do_fd_cleanup] 0-data-volume-server: fd
        cleanup on
        /projects/dri_calypso/PE/2019/Apr09/Ens3R200deg001/pe_out.nc.gz

        [2022-06-21 21:03:54.489817] I [MSGID: 101055]
        [client_t.c:420:gf_client_unref] 0-data-volume-server: Shutting
        down connection
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-31

        [2022-06-21 21:04:04.506544] I [MSGID: 115029]
        [server-handshake.c:690:server_setvolume] 0-data-volume-server:
        accepted client from
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-32
        (version: 3.7.11)

        [2022-06-21 21:20:23.625096] I [MSGID: 115036]
        [server.c:552:server_rpc_notify] 0-data-volume-server:
        disconnecting connection from
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-32

        [2022-06-21 21:20:23.625189] I [MSGID: 115013]
        [server-helpers.c:294:do_fd_cleanup] 0-data-volume-server: fd
        cleanup on
        /projects/dri_calypso/PE/2019/Apr09/Ens3R200deg001/pe_out.nc.gz

        [2022-06-21 21:20:23.625255] I [MSGID: 101055]
        [client_t.c:420:gf_client_unref] 0-data-volume-server: Shutting
        down connection
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-32

        [2022-06-21 21:20:23.641462] I [MSGID: 115029]
        [server-handshake.c:690:server_setvolume] 0-data-volume-server:
        accepted client from
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-33
        (version: 3.7.11)

    On 6/17/22 2:18 AM, Strahil Nikolov
      wrote:

      Check with top & iotop the load.
      Especially check the wait for I/O in top.

      Did you check dmesg for any clues ?

      Best Regards,
      Strahil Nikolov

            On Thu, Jun 16, 2022 at 22:59, Pat Haley
            <phaley@xxxxxxx> wrote:

                Hi Strahil,
                I poked around our logs, and found this on the
                  front-end (from the day & time of the last time we
                  had the issue)

                Jun 15 10:51:17 mseas
                    gdata[155485]: [2022-06-15 14:51:17.263858] C
                    [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired]
                    0-data-volume-client-2: server 172.16.1.113:49153
                    has not responded in the last 42 seconds,
                    disconnecting.

                This would indicate that the problem is related.  For
                  us, however, I believe we can reproduce this issue at
                  will (i.e. simply try to gunzip the same file).
                  Unfortunately I have to go to a meeting now, but if
                  you have some specific tests you'd like me to try, I
                  can try them when I get back.
                Thanks
                Pat

                On 6/16/22
                  3:07 PM, Strahil Nikolov wrote:

               Pat, 

                  Can you check the cpu and disk  performance when
                    the volume reports the issue?

                It seems that similar issue was reported
                  in https://lists.gluster.org/pipermail/gluster-users/2019-March/035944.html
                  but I don't see a clear solution.
                Take a look in the thread and check if it matches
                  your symptoms.

                Best Regards,
                Strahil Nikolov

                      On Thu, Jun 16, 2022 at 18:14, Pat Haley
                      <phaley@xxxxxxx>
                        wrote:

                          Hi Strahil,
                          I poked around again and for brick 3 (where
                            the file we were testing resides)  I only
                            found the same log file as was at the bottom
                            of my first Email:

                            ---------------------------------------------------

                              mseas-data3:  bricks/export-sda-brick3.log

                              -----------------------------------------

                              [2022-06-15 14:50:42.588143] I [MSGID:
                              115036] [server.c:552:server_rpc_notify]
                              0-data-volume-server: disconnecting
                              connection from
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-28

                              [2022-06-15 14:50:42.588220] I [MSGID:
                              115013]
                              [server-helpers.c:294:do_fd_cleanup]
                              0-data-volume-server: fd cleanup on
/projects/posydon/Acoustics_ASA/MSEAS-ParEq-DO/Save/2D/Test_Cases/RI/DO_NAPE_JASA_Paper/Uncertain_Pekeris_Waveguide_DO_MC

                              [2022-06-15 14:50:42.588259] I [MSGID:
                              115013]
                              [server-helpers.c:294:do_fd_cleanup]
                              0-data-volume-server: fd cleanup on
                              /projects/dri_calypso/PE/2019/Apr09/Ens3R200deg001/pe_out.nc.gz

                              [2022-06-15 14:50:42.588288] I [MSGID:
                              101055] [client_t.c:420:gf_client_unref]
                              0-data-volume-server: Shutting down
                              connection
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-28

                              [2022-06-15 14:50:53.605215] I [MSGID:
                              115029]
                              [server-handshake.c:690:server_setvolume]
                              0-data-volume-server: accepted client from
mseas.mit.edu-155483-2022/05/13-03:24:14:618694-data-volume-client-2-0-29
                              (version: 3.7.11)

                              [2022-06-15 14:50:42.588247] I [MSGID:
                              115013]
                              [server-helpers.c:294:do_fd_cleanup]
                              0-data-volume-server: fd cleanup on
/projects/posydon/Acoustics_ASA/MSEAS-ParEq-DO/Save/2D/Test_Cases/RI/DO_NAPE_JASA_Paper/Uncertain_Pekeris_Waveguide_DO_MC

                          Thanks
                          Pat

                            On
                              6/15/22 6:47 PM, Strahil Nikolov wrote:

                              I agree. It will be very hard to
                                debug.

                                Anything in the brick logs ?

                                I think it's pointless to mention
                                  that EL6 is dead and Gluster v3 is so
                                  old that it's worth considering a
                                  migration to a newer setup.

                                Best Regards,
                                Strahil Nikolov

                                        On Wed, Jun 15, 2022 at
                                          22:51, Yaniv Kaul
                                        <ykaul@xxxxxxxxxx>
                                          wrote:

                                       ________

                                        Community Meeting Calendar:

                                        Schedule -

                                        Every 2nd and 4th Tuesday at
                                        14:30 IST / 09:00 UTC

                                        Bridge: https://meet.google.com/cpu-eiue-hvk

                                        Gluster-users mailing list

                                        Gluster-users@xxxxxxxxxxx

                                        https://lists.gluster.org/mailman/listinfo/gluster-users

                            -- 

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                          Email:  phaley@xxxxxxx
Center for Ocean Engineering       Phone:  (617) 253-6824
Dept. of Mechanical Engineering    Fax:    (617) 253-8125
MIT, Room 5-213                    http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301

                  -- 

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                          Email:  phaley@xxxxxxx
Center for Ocean Engineering       Phone:  (617) 253-6824
Dept. of Mechanical Engineering    Fax:    (617) 253-8125
MIT, Room 5-213                    http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301

    -- 

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                          Email:  phaley@xxxxxxx
Center for Ocean Engineering       Phone:  (617) 253-6824
Dept. of Mechanical Engineering    Fax:    (617) 253-8125
MIT, Room 5-213                    http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users