Re: RGW hung, 2 OSDs using 100% CPU

Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> · Thu, 27 Mar 2014 18:04:08 -0700



    In the interest of removing variables,
      I removed all snapshots on all pools, then restarted all ceph
      daemons at the same time.  This brought up osd.8 as well.

      
      The cluster started recovering.  Now osd.4 and osd.13 are doing
      this.

      
      Any suggestions for how I can see what the hung OSDs are doing? 
      The logs don't look interesting.  Is there a higher log level I
      can use?

      
      I'm trying to use strace on osd.4:

      strace -tt -f -ff -o ./ceph-osd.4.strace -x /usr/bin/ceph-osd
        --cluster=ceph -i 4 -f

      
      So far, strace is running, and the process isn't hung.  After I
      ran this, the cluster finally finished backfilling the last of the
      PGs (all on osd.4).

      
      Since the cluster is healthy again, I killed the strace, and
      started daemon normally (start ceph-osd id=4).  Things seem fine
      now.  I'm going to let it scrub and deepscrub overnight.  I'll
      restart radosgw-agent tomorrow.

      
              Craig Lewis
              

               Senior Systems Engineer

                Office +1.714.602.1309

                Email clewis@xxxxxxxxxxxxxxxxxx
               
              Central Desktop.
                  Work together in ways you never thought possible.
                 

                   Connect with us   Website  |  Twitter  |  Facebook  |  LinkedIn  |  Blog  

                
      On 3/27/14 10:44 , Craig Lewis wrote:

    
      The

        osd.8 log shows it doing some deep scrubbing here. Perhaps that
        is 

        what caused your earlier issues with CPU usage? 

      
      When I first noticed the CPU usage, I checked iotop and iostat. 
      Both said there was no disk activity, on any OSD.

      
        At 14:17:25, I ran radosgw-admin
          --name=client.radosgw.ceph1c regions 

          list && radosgw-admin --name=client.radosgw.ceph1c
          regionmap get. 

          regions list hung, and I killed At 14:18:15, I stopped
          ceph-osd id=8. 

          At 14:18:45, I ran radosgw-admin --name=client.radosgw.ceph1c
          regions 

          list && radosgw-admin --name=client.radosgw.ceph1c
          regionmap get.  It 

          returned successfully. 

          At 14:19:10, I stopped ceph-osd id=*/4/*. 

        
        Since you've got the noout flag set, when osd.8 goes down any
        objects 

        for which osd.8 is the primary will not be readable. Since ceph
        reads 

        from primaries, and the noout flag prevents another osd from
        being 

        selected, which would happen if osd.8 were marked out, these
        objects 

        (which apparently happen to include some needed for regions list
        or 

        regionmap get) are inaccessible. 

        
        Josh 

        
      Taking osd.8 down (regardless of the noout flag) was the only way
      to things to respond.  I have not set nodown, just noout.

      
      When I got in this morning, I had 4 more flapping OSDs: osd.4,
      osd.12, osd.13, and osd.6.  All 4 daemons were all using 100% CPU,
      and no disk I/O.

      
      osd.1 and osd.14 are the only ones currently using disk I/O.

      
      There are 3 PGs being deepscrubbed:

      root@ceph1c:/var/log/radosgw-agent# ceph pg dump | grep deep

      dumped all in format plain

      pg_stat    objects    mip    degr    unf    bytes   
        log    disklog    state    state_stamp    v    reported    up   

        acting    last_scrub    scrub_stamp    last_deep_scrub   
        deep_scrub_stamp

      11.774    8682    0    0    0    7614655060    3001   
        3001    active+clean+scrubbing+deep    2014-03-27
        10:20:30.598032    8381'5180514    8521:6520833    [13,4]   

        [13,4]    7894'5176984    2014-03-20 04:41:48.762996   
        7894'5176984    2014-03-20 04:41:48.762996

      11.698    8587    0    0    0    7723737171    3001   
        3001    active+clean+scrubbing+deep    2014-03-27
        10:16:31.292487    8383'483312    8521:618864    [14,1]   

        [14,1]    7894'479783    2014-03-20 03:53:18.024015   
        7894'479783    2014-03-20 03:53:18.024015

      11.d8    8743    0    0    0    7570365909    3409   
        3409    active+clean+scrubbing+deep    2014-03-27
        10:15:39.558121    8396'1753407    8521:2417672    [12,6]   

        [12,6]    7894'1459230    2014-03-20 02:40:22.123236   
        7894'1459230    2014-03-20 02:40:22.123236

      
      These PGs are on the 6 OSDs mentioned.  osd.1 and osd.14 are not
      using 100% CPU and are using disk IO.  osd.12, osd.6, osd.4, and
      osd.13 are using 100% CPU, and 0 kB/s of disk IO.  Here's iostat
      on ceph0c, which contains osd.1 (/dev/sdd), osd.4 (/dev/sde), and
      osd.6 (/dev/sdg):

      root@ceph0c:/var/log/ceph# iostat -p sdd,sde,sdh 1

      Linux 3.5.0-46-generic (ceph0c)     03/27/2014    
        _x86_64_    (8 CPU)

      <snip>

      avg-cpu:  %user   %nice %system %iowait  %steal   %idle

                32.64    0.00    5.52    4.42    0.00   57.42

      
      Device:            tps    kB_read/s    kB_wrtn/s   
        kB_read    kB_wrtn

      sdd             113.00       900.00         0.00       
        900          0

      sdd1            113.00       900.00         0.00       
        900          0

      sde               0.00         0.00         0.00         
        0          0

      sde1              0.00         0.00         0.00         
        0          0

      sdh               0.00         0.00         0.00         
        0          0

      sdh1              0.00         0.00         0.00         
        0          0

      
      avg-cpu:  %user   %nice %system %iowait  %steal   %idle

                29.90    0.00    4.41    2.82    0.00   62.87

      
      Device:            tps    kB_read/s    kB_wrtn/s   
        kB_read    kB_wrtn

      sdd             181.00      1332.00         0.00      
        1332          0

      sdd1            181.00      1332.00         0.00      
        1332          0

      sde              22.00         8.00       328.00         
        8        328

      sde1             18.00         8.00       328.00         
        8        328

      sdh              18.00         4.00       228.00         
        4        228

      sdh1             15.00         4.00       228.00         
        4        228

      
      avg-cpu:  %user   %nice %system %iowait  %steal   %idle

                30.21    0.00    4.26    1.71    0.00   63.82

      
      Device:            tps    kB_read/s    kB_wrtn/s   
        kB_read    kB_wrtn

      sdd             180.00      1044.00       200.00      
        1044        200

      sdd1            177.00      1044.00       200.00      
        1044        200

      sde               0.00         0.00         0.00         
        0          0

      sde1              0.00         0.00         0.00         
        0          0

      sdh               0.00         0.00         0.00         
        0          0

      sdh1              0.00         0.00         0.00         
        0          0

      
      So it's not no disk activity, but it's pretty close.  The disks
      continue to have 0 kB_read and 0kB_wrtn for the next 60 seconds. 
      It's much lower than I would expect for OSDs executing a
      deepscrub.  

      
      I restarted the 4 flapping OSDs.  They recovered, then started
      flapping within 5 minutes.  I shut all of the ceph daemons down,
      and rebooted all nodes at the same time.  The OSDs return to 100%
      CPU usage very soon after boot.

      
      I was going to ask if I should zap osd.8 and re-add it to the
      cluster.   I don't think that's possible now.

      
      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com