ceph osd debug question / proposal

Goncalo Borges <goncalo@xxxxxxxxxxxxxxxxxxx> · Thu, 20 Aug 2015 16:01:04 +1000



    Dear Ceph gurus...

    
    Just wanted to report something that may be interesting to
    enhance... or maybe I am not doing the right debugging procedure. 

    
    1. I am working with 0.92.2 and I am testing the cluster in several
    disaster catastrophe scenarios.

    
    2. I have 32 OSDs distributed in 4 servers, meaning that I have 8
    OSD per server.

    
    3. I have deliberately unmounted  the filesystem of osd.4 but the
    daemon was left on. I just wanted to understand how the system would
    react. This was what happened:

    a. While there was no I/0, the system did not realized
      that the osd-4 filesystem was not mounted, and the 'ceph -s'
      continues to report HEALTH_OK for the system status. 

      
      b. When I've started to impose some heavy I/O, the system started
      to complain of slow requests. Curiously, osd.4 never appears in
      the logs.

    
      # ceph -s

            cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc

             health HEALTH_WARN

                    170 requests are blocked > 32 sec

             monmap e1: 3 mons at
{rccephmon1=192.231.127.8:6789/0,rccephmon2=192.231.127.34:6789/0,rccephmon3=192.231.127.26:6789/0}

                    election epoch 24, quorum 0,1,2
        rccephmon1,rccephmon3,rccephmon2

             mdsmap e162: 1/1/1 up {0=rccephmds=up:active}, 1
        up:standby-replay

             osdmap e1179: 32 osds: 32 up, 32 in

              pgmap v907325: 2176 pgs, 2 pools, 4928 GB data, 1843
        kobjects

                    14823 GB used, 74228 GB / 89051 GB avail

                        2174 active+clean

                           2 active+clean+replay

        
        # ceph -w

        (...)          

        2015-08-19 17:44:55.161731 osd.1 [WRN] 88 slow requests, 8
        included below; oldest blocked for > 3156.325716 secs

        2015-08-19 17:44:55.161940 osd.1 [WRN] slow request 1920.533342
        seconds old, received at 2015-08-19 17:12:54.628258:
        osd_op(client.44544.1:2266980 1000000022a.00006aec [write
        524288~524288 [1@-1]] 5.e0cf740e snapc 1=[] ondisk+write e1171)
        currently waiting for replay end

        2015-08-19 17:44:55.161950 osd.1 [WRN] slow request 1920.511098
        seconds old, received at 2015-08-19 17:12:54.650502:
        osd_op(client.44544.1:2266988 1000000022a.00006aec [write
        1048576~524288 [1@-1]] 5.e0cf740e snapc 1=[] ondisk+write e1171)
        currently waiting for replay end

        2015-08-19 17:44:55.161957 osd.1 [WRN] slow request 1920.510451
        seconds old, received at 2015-08-19 17:12:54.651149:
        osd_op(client.44544.1:2266996 1000000022a.00006aec [write
        1572864~524288 [1@-1]] 5.e0cf740e snapc 1=[] ondisk+write e1171)
        currently waiting for replay end

        2015-08-19 17:44:55.161963 osd.1 [WRN] slow request 1920.488589
        seconds old, received at 2015-08-19 17:12:54.673011:
        osd_op(client.44544.1:2267004 1000000022a.00006aec [write
        2097152~524288 [1@-1]] 5.e0cf740e snapc 1=[] ondisk+write e1171)
        currently waiting for replay end

        2015-08-19 17:44:55.161970 osd.1 [WRN] slow request 1920.482785
        seconds old, received at 2015-08-19 17:12:54.678815:
        osd_op(client.44544.1:2267012 1000000022a.00006aec [write
        2621440~524288 [1@-1]] 5.e0cf740e snapc 1=[] ondisk+write e1171)
        currently waiting for replay end

        (...)                 

        # grep "slow requests" /tmp/osd_failed.txt  | awk '{print $3}' |
        sort | uniq

        osd.1

        osd.11

        osd.17

        osd.23

        osd.24

        osd.26

        osd.27

        osd.31

        osd.7

      
      c. None of the standard 'ceph osd' commands indicated that the
      problematic OSD was osd.4. Only looking to ceph-osd.4.log, we find
      write error messages:

    
      015-08-19 16:52:17.552512 7f6f69973700  0 --
        10.100.1.167:6809/23763 >> 10.100.1.169:6800/28352
        pipe(0x175ca000 sd=169 :6809 s=0 pgs=0 cs=0 l=0
        c=0x1f038000).accept connect_seq 180 vs existing 179 state
        standby

        2015-08-19 16:52:17.566701 7f6f89d2a700 -1
        filestore(/var/lib/ceph/osd/ceph-4) could not find
        e6f81180/1000000022a.00000030/head//5 in index: (2) No such file
        or directory

        2015-08-19 16:52:17.567230 7f6f89d2a700  0
        filestore(/var/lib/ceph/osd/ceph-4) write couldn't open
        5.180_head/e6f81180/1000000022a.00000030/head//5: (2) No such
        file or directory

        2015-08-19 16:52:17.567332 7f6f89d2a700 -1
        filestore(/var/lib/ceph/osd/ceph-4) could not find
        e6f81180/1000000022a.00000030/head//5 in index: (2) No such file
        or directory

        2015-08-19 16:52:17.567365 7f6f89d2a700 -1
        filestore(/var/lib/ceph/osd/ceph-4) could not find
        e6f81180/1000000022a.00000030/head//5 in index: (2) No such file
        or directory

        2015-08-19 16:52:17.570206 7f6f8a72b700 -1
        filestore(/var/lib/ceph/osd/ceph-4) could not find
        e6f81180/1000000022a.00000030/head//5 in index: (2) No such file
        or directory

        2015-08-19 16:52:17.570233 7f6f8a72b700  0
        filestore(/var/lib/ceph/osd/ceph-4) write couldn't open
        5.180_head/e6f81180/1000000022a.00000030/head//5: (2) No such
        file or directory

        2015-08-19 16:52:17.570260 7f6f8a72b700 -1
        filestore(/var/lib/ceph/osd/ceph-4) could not find
        e6f81180/1000000022a.00000030/head//5 in index: (2) No such file
        or directory

        2015-08-19 16:52:17.570603 7f6f8a72b700 -1
        filestore(/var/lib/ceph/osd/ceph-4) could not find
        e6f81180/1000000022a.00000030/head//5 in index: (2) No such file
        or directory

        2015-08-19 16:52:17.572468 7f6f89d2a700 -1
        filestore(/var/lib/ceph/osd/ceph-4) could not find
        e6f81180/1000000022a.00000030/head//5 in index: (2) No such file
        or directory

        2015-08-19 16:52:17.572497 7f6f89d2a700  0
        filestore(/var/lib/ceph/osd/ceph-4) write couldn't open
        5.180_head/e6f81180/1000000022a.00000030/head//5: (2) No such
        file or directory

        
    d. A restart of osd.4, remounted the osd.4 filesystem
      but the slow connection remain there. I had to restart all the
      osds in the same server as osd.4. Once that was done, ceph
      realized there were degraded PGs, and started recovery I/O.

      
    If I am doing my debugging correctly, I see some potential problems:

    
    - ceph was not capable to understand there was a missing or
    problematic file system until the client tries to access it for I/O.

    - Then, the source of the problem was not obvious from the warning
    messages ceph was throwing. In this particular situation, if I would
    be running hundreds of osds and if I wouldn't know in advance which
    of them was failing, it would be difficult to identify the
    problematic one. 

    
    Maybe there is a more efficient way to perform this kind of debug?
    If not, can't an additional check show that the daemon can not
    properly access the filesystem? For example, I can imagine situation
    where the filesystem may be in read-only, for example.

    
    TIA

    Goncalo

    
    -- 
Goncalo Borges
Research Computing
ARC Centre of Excellence for Particle Physics at the Terascale
School of Physics A28 | University of Sydney, NSW  2006
T: +61 2 93511937

  
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com