Re: Ceph OSD get blocked and start to make inconsistent pg from time to time

David Turner <drakonstein@xxxxxxxxx> · Fri, 29 Sep 2017 15:25:48 +0000

I'm going to assume you're dealing with your scrub errors and have a game plan for those as you didn't mention them in your question at all.
One thing I'm always leary of when I see blocked requests happening is that the PGs might be splitting subfolders.  It is pretty much a guarantee if you're also seeing OSDs marking themselves down for short periods of time.  You can check this by searching the OSD logs for "wrongly marked me down".  Here is a previous ML thread with some information about this.  It also goes into the options of RBD snapshots causing some problems.

https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg36923.html

On Fri, Sep 29, 2017 at 5:23 AM Gonzalo Aguilar Delgado <gaguilar@xxxxxxxxxxxxxxxxxx> wrote:

    Hi, 

    I discovered that my
        cluster starts to make slow requests and all disk activity get
        blocked. 

    This happens once a
        day. And the ceph OSD get 100% CPU. In the ceph health I get
        something like:

    2017-09-29 10:49:01.227257  [INF] pgmap v67494428: 764 pgs: 1
        active+recovery_wait+degraded+inconsistent, 8
        active+recovery_wait+degraded, 749 active+clean, 6
        active+clean+inconsistent; 557 GB data, 1116 GB used, 3063 GB /
        4179 GB avail; 1205 B/s wr, 0 op/s; 11/287452 objects degraded
        (0.004%)

        2017-09-29 10:49:03.517776  [INF] pgmap v67494429: 764 pgs: 1
        active+recovery_wait+degraded+inconsistent, 8
        active+recovery_wait+degraded, 749 active+clean, 6
        active+clean+inconsistent; 557 GB data, 1116 GB used, 3063 GB /
        4179 GB avail; 53614 B/s wr, 1 op/s; 11/287452 objects degraded
        (0.004%)

        2017-09-29 10:48:56.557639  [WRN] 21 slow requests, 1
          included below; oldest blocked for > 143.328257 secs

        2017-09-29 10:48:56.557644  [WRN] slow request 60.839383
          seconds old, received at 2017-09-29 10:47:55.718212:
          osd_op(client.7189300.0:949334 9.f3418fe
          rb.0.21dc.238e1f29.000000007262 [set-alloc-hint object_size
          4194304 write_size 4194304,write 3211264~20480] snapc 0=[]
          ondisk+write+known_if_redirected e7435) currently waiting for
          subops from 0

        2017-09-29 10:49:03.558966  [WRN] 21 slow requests, 1
          included below; oldest blocked for > 150.329585 secs

        2017-09-29 10:49:03.558972  [WRN] slow request 120.752027
          seconds old, received at 2017-09-29 10:47:02.806894:
          osd_op(client.7189300.0:949307 9.583569cb
          rb.0.21dc.238e1f29.000000006a66 [set-alloc-hint object_size
          4194304 write_size 4194304,write 2465792~20480] snapc 0=[]
          ondisk+write+known_if_redirected e7435) currently waiting for
          subops from 0

        2017-09-29 10:49:04.651490  [INF] pgmap v67494430: 764 pgs: 1
        active+recovery_wait+degraded+inconsistent, 8
        active+recovery_wait+degraded, 749 active+clean, 6
        active+clean+inconsistent; 557 GB data, 1116 GB used, 3063 GB /
        4179 GB avail; 53620 B/s wr, 1 op/s; 11/287452 objects degraded
        (0.004%)

        2017-09-29 10:49:07.066544  [INF] pgmap v67494431: 764 pgs: 1
        active+recovery_wait+degraded+inconsistent, 8
        active+recovery_wait+degraded, 749 active+clean, 6
        active+clean+inconsistent; 557 GB data, 1116 GB used, 3063 GB /
        4179 GB avail; 11/287452 objects degraded (0.004%)

        ^Croot@gray-compute:~# ceph -w

            cluster 9028f4da-0d77-462b-be9b-dbdf7fa57771

             health HEALTH_ERR

                    9 pgs degraded

                    7 pgs inconsistent

                    9 pgs recovery_wait

                    37 requests are blocked > 32 sec

                    recovery 11/287452 objects degraded (0.004%)

                    42 scrub errors

                    too many PGs per OSD (305 > max 300)

             monmap e12: 2 mons at
        {blue-compute=172.16.0.119:6789/0,red-compute=172.16.0.100:6789/0}

                    election epoch 4966, quorum 0,1
        red-compute,blue-compute

              fsmap e891: 1/1/1 up {0=blue-compute=up:active}

             osdmap e7435: 5 osds: 5 up, 5 in

                    flags require_jewel_osds

              pgmap v67494432: 764 pgs, 6 pools, 557 GB data, 140
        kobjects

                    1116 GB used, 3063 GB / 4179 GB avail

                    11/287452 objects degraded (0.004%)

                         749 active+clean

                           8 active+recovery_wait+degraded

                           6 active+clean+inconsistent

                           1 active+recovery_wait+degraded+inconsistent

        2017-09-29 10:49:07.066544  [INF] pgmap v67494431: 764 pgs: 1
        active+recovery_wait+degraded+inconsistent, 8
        active+recovery_wait+degraded, 749 active+clean, 6
        active+clean+inconsistent; 557 GB data, 1116 GB used, 3063 GB /
        4179 GB avail; 11/287452 objects degraded (0.004%)

        2017-09-29 10:49:08.301813  [INF] pgmap v67494432: 764 pgs: 1
        active+recovery_wait+degraded+inconsistent, 8
        active+recovery_wait+degraded, 749 active+clean, 6
        active+clean+inconsistent; 557 GB data, 1116 GB used, 3063 GB /
        4179 GB avail; 11/287452 objects degraded (0.004%)

    The only solution I found is to go the machine where that OSD
      lives and reboot it. What I see in the logs:

    2017-09-29 10:50:19.800910 7f2a7eb128c0 20 --
        172.16.0.119:6802/5863 wait: stopped accepter thread

      2017-09-29 10:50:19.800917 7f2a7eb128c0 20 --
        172.16.0.119:6802/5863 wait: stopping reaper thread

      2017-09-29 10:50:19.800937 7f2a74f92700 10 --
        172.16.0.119:6802/5863 reaper_entry done

      2017-09-29 10:50:19.801008 7f2a7eb128c0 20 --
        172.16.0.119:6802/5863 wait: stopped reaper thread

      2017-09-29 10:50:19.801015 7f2a7eb128c0 10 --
        172.16.0.119:6802/5863 wait: closing pipes

      2017-09-29 10:50:19.801017 7f2a7eb128c0 10 --
        172.16.0.119:6802/5863 reaper

      2017-09-29 10:50:19.801019 7f2a7eb128c0 10 --
        172.16.0.119:6802/5863 reaper done

      2017-09-29 10:50:19.801020 7f2a7eb128c0 10 --
        172.16.0.119:6802/5863 wait: waiting for pipes  to close

      2017-09-29 10:50:19.801022 7f2a7eb128c0 10 --
        172.16.0.119:6802/5863 wait: done.

      2017-09-29 10:50:19.801024 7f2a7eb128c0  1 --
        172.16.0.119:6802/5863 shutdown complete.

      2017-09-29 10:50:19.801026 7f2a7eb128c0 10 -- :/5863
        wait: waiting for dispatch queue

      2017-09-29 10:50:19.801040 7f2a7eb128c0 10 -- :/5863
        wait: dispatch queue is stopped

      2017-09-29 10:50:19.801042 7f2a7eb128c0 20 -- :/5863
        wait: stopping reaper thread

      2017-09-29 10:50:19.801056 7f2a74791700 10 -- :/5863
        reaper_entry done

      2017-09-29 10:50:19.801088 7f2a7eb128c0 20 -- :/5863
        wait: stopped reaper thread

      2017-09-29 10:50:19.801094 7f2a7eb128c0 10 -- :/5863
        wait: closing pipes

      2017-09-29 10:50:19.801095 7f2a7eb128c0 10 -- :/5863
        reaper

      2017-09-29 10:50:19.801097 7f2a7eb128c0 10 -- :/5863
        reaper done

      2017-09-29 10:50:19.801098 7f2a7eb128c0 10 -- :/5863
        wait: waiting for pipes  to close

      2017-09-29 10:50:19.801099 7f2a7eb128c0 10 -- :/5863
        wait: done.

      2017-09-29 10:50:19.801100 7f2a7eb128c0  1 -- :/5863
        shutdown complete.

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com