Re: RGW hung, 2 OSDs using 100% CPU

Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> · Thu, 27 Mar 2014 10:44:08 -0700



    The
      osd.8 log shows it doing some deep scrubbing here. Perhaps that is
      

      what caused your earlier issues with CPU usage?
      

    When I first noticed the CPU usage, I checked iotop and iostat. 
    Both said there was no disk activity, on any OSD.

    
      At 14:17:25, I ran radosgw-admin
        --name=client.radosgw.ceph1c regions
        

        list && radosgw-admin --name=client.radosgw.ceph1c
        regionmap get.
        

        regions list hung, and I killed At 14:18:15, I stopped ceph-osd
        id=8.
        

        At 14:18:45, I ran radosgw-admin --name=client.radosgw.ceph1c
        regions
        

        list && radosgw-admin --name=client.radosgw.ceph1c
        regionmap get.  It
        

        returned successfully.
        

        At 14:19:10, I stopped ceph-osd id=*/4/*.
        

      Since you've got the noout flag set, when osd.8 goes down any
      objects
      

      for which osd.8 is the primary will not be readable. Since ceph
      reads
      

      from primaries, and the noout flag prevents another osd from being
      

      selected, which would happen if osd.8 were marked out, these
      objects
      

      (which apparently happen to include some needed for regions list
      or
      

      regionmap get) are inaccessible.
      

      Josh
      

    Taking osd.8 down (regardless of the noout flag) was the only way to
    things to respond.  I have not set nodown, just noout.

    
    When I got in this morning, I had 4 more flapping OSDs: osd.4,
    osd.12, osd.13, and osd.6.  All 4 daemons were all using 100% CPU,
    and no disk I/O.

    
    osd.1 and osd.14 are the only ones currently using disk I/O.

    
    There are 3 PGs being deepscrubbed:

    root@ceph1c:/var/log/radosgw-agent# ceph pg dump | grep deep

    dumped all in format plain

    pg_stat    objects    mip    degr    unf    bytes    log   
      disklog    state    state_stamp    v    reported    up   
      acting    last_scrub    scrub_stamp    last_deep_scrub   
      deep_scrub_stamp

    11.774    8682    0    0    0    7614655060    3001   
      3001    active+clean+scrubbing+deep    2014-03-27
      10:20:30.598032    8381'5180514    8521:6520833    [13,4]   
      [13,4]    7894'5176984    2014-03-20 04:41:48.762996   
      7894'5176984    2014-03-20 04:41:48.762996

    11.698    8587    0    0    0    7723737171    3001   
      3001    active+clean+scrubbing+deep    2014-03-27
      10:16:31.292487    8383'483312    8521:618864    [14,1]   
      [14,1]    7894'479783    2014-03-20 03:53:18.024015   
      7894'479783    2014-03-20 03:53:18.024015

    11.d8    8743    0    0    0    7570365909    3409   
      3409    active+clean+scrubbing+deep    2014-03-27
      10:15:39.558121    8396'1753407    8521:2417672    [12,6]   
      [12,6]    7894'1459230    2014-03-20 02:40:22.123236   
      7894'1459230    2014-03-20 02:40:22.123236

    
    These PGs are on the 6 OSDs mentioned.  osd.1 and osd.14 are not
    using 100% CPU and are using disk IO.  osd.12, osd.6, osd.4, and
    osd.13 are using 100% CPU, and 0 kB/s of disk IO.  Here's iostat on
    ceph0c, which contains osd.1 (/dev/sdd), osd.4 (/dev/sde), and osd.6
    (/dev/sdg):

    root@ceph0c:/var/log/ceph# iostat -p sdd,sde,sdh 1

    Linux 3.5.0-46-generic (ceph0c)     03/27/2014    
      _x86_64_    (8 CPU)

    <snip>

    avg-cpu:  %user   %nice %system %iowait  %steal   %idle

              32.64    0.00    5.52    4.42    0.00   57.42

    
    Device:            tps    kB_read/s    kB_wrtn/s   
      kB_read    kB_wrtn

    sdd             113.00       900.00         0.00       
      900          0

    sdd1            113.00       900.00         0.00       
      900          0

    sde               0.00         0.00         0.00         
      0          0

    sde1              0.00         0.00         0.00         
      0          0

    sdh               0.00         0.00         0.00         
      0          0

    sdh1              0.00         0.00         0.00         
      0          0

    
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle

              29.90    0.00    4.41    2.82    0.00   62.87

    
    Device:            tps    kB_read/s    kB_wrtn/s   
      kB_read    kB_wrtn

    sdd             181.00      1332.00         0.00      
      1332          0

    sdd1            181.00      1332.00         0.00      
      1332          0

    sde              22.00         8.00       328.00         
      8        328

    sde1             18.00         8.00       328.00         
      8        328

    sdh              18.00         4.00       228.00         
      4        228

    sdh1             15.00         4.00       228.00         
      4        228

    
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle

              30.21    0.00    4.26    1.71    0.00   63.82

    
    Device:            tps    kB_read/s    kB_wrtn/s   
      kB_read    kB_wrtn

    sdd             180.00      1044.00       200.00      
      1044        200

    sdd1            177.00      1044.00       200.00      
      1044        200

    sde               0.00         0.00         0.00         
      0          0

    sde1              0.00         0.00         0.00         
      0          0

    sdh               0.00         0.00         0.00         
      0          0

    sdh1              0.00         0.00         0.00         
      0          0

    
    So it's not no disk activity, but it's pretty close.  The disks
    continue to have 0 kB_read and 0kB_wrtn for the next 60 seconds. 
    It's much lower than I would expect for OSDs executing a deepscrub. 
    

    I restarted the 4 flapping OSDs.  They recovered, then started
    flapping within 5 minutes.  I shut all of the ceph daemons down, and
    rebooted all nodes at the same time.  The OSDs return to 100% CPU
    usage very soon after boot.

    
    I was going to ask if I should zap osd.8 and re-add it to the
    cluster.   I don't think that's possible now.

    
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com