deep-scrub taking long time(possible leveldb corruption?)

Stanley Zhang <stanley.zhang@xxxxxxxxxxxx> · Wed, 2 Aug 2017 12:16:58 +1200

    Hi
    We have a 4 physical nodes cluster running Jewel, our app talks
      S3 to the cluster and uses S3 index heavily no-doubt. We've had
      several big outages in the past that seem caused by a deep-scrub
      on one of the PGs in S3 index pool. Generally it starts from a
      deep scrub on one such PG then follows with lots of slow requests
      blocking/accumulating which eventually makes the whole cluster
      down. In the event like this, we have to set OSD to
      noup/nodown/noout to let OSD not suicide during such deep-scrub.
    In a recent outage, the deep-scrub of one PG took 2 hours to
      finish, after finished, I happened to try listing all omap keys of
      the objects in that PG and found that listing keys of one
      particular object can cause same outage described above, it
      indicates to me that the index object was corrupted, but I can't
      find anything in the logs. Interestingly (to me), 2 days later
      that index object seems have fixed itself: listing omap keys quick
      and easy, deep-scrubbing same PG only takes 3 seconds. 

    The deep-scrub that took 2 hours to finish:

      xxxx.log-20170730.gz:2017-07-29 12:14:10.476325 osd.2
      x.x.x.x:6800/78482 217 : cluster [INF] 11.11 deep-scrub starts

      xxxx.log-20170730.gz:2017-07-29 14:05:12.108523 osd.2
      x.x.x.203:6800/78482 1795 : cluster [INF] 11.11 deep-scrub ok
    The command I used to list all omap keys:

      rados -p .rgw.buckets.index listomapkeys
      .dir.c82cdc62-7926-440d-8085-4e7879ef8155.26048.647 | wc -l

    Most recent deep-scrub kicked off manually:

      2017-07-31 09:54:37.997911 7f78bc333700  0 log_channel(cluster)
      log [INF] : 11.11 deep-scrub starts

      2017-07-31 09:54:40.539494 7f78bc333700  0 log_channel(cluster)
      log [INF] : 11.11 deep-scrub ok

    Setting debug_leveldb to 20/5 didn't log any useful information
      for the event, sorry, but a perf record shows most (83%) of the
      time was used on LevelDB operations (screenshot or perf file can
      be supplied if anybody interested since it's over 150KB size
      limit.). 

    I wonder if anybody came across similar issue before or can
      explain what happened to the index object to make it not-usable
      before but usable 2 days later? One thing that might fix the index
      object is leveldb compactions I guess. By the way the above
      problematic index object has ~30k keys, the biggest index object
      in our cluster holds about 300k keys.

    Regards

    Stanley

    -- 

       Stanley Zhang |  Senior Operations Engineer 

        Telephone: +64 9 302 0515 Fax:
        +64 9 302 0518 

        Mobile: +64 22 318 3664 Freephone:
        0800 SMX SMX (769 769) 

        SMX Limited: Level 15, 19 Victoria
          Street West, Auckland, New Zealand 

        Web: http://smxemail.com

  This email has been filtered by SMX. For more information visit smxemail.com.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com