Re: Usage of devices in SSD pool vary very much

Kevin Olbrich <ko@xxxxxxx> · Fri, 4 Jan 2019 19:46:54 +0100

Hi!

I did what you wrote but my MGRs started to crash again:
root@adminnode:~# ceph -s
  cluster:
    id:     086d9f80-6249-4594-92d0-e31b6aaaaa9c
    health: HEALTH_WARN
            no active mgr
            105498/6277782 objects misplaced (1.680%)

  services:
    mon: 3 daemons, quorum mon01,mon02,mon03
    mgr: no daemons active
    osd: 44 osds: 43 up, 43 in

  data:
    pools:   4 pools, 1616 pgs
    objects: 1.88M objects, 7.07TiB
    usage:   13.2TiB used, 16.7TiB / 29.9TiB avail
    pgs:     105498/6277782 objects misplaced (1.680%)
             1606 active+clean
             8    active+remapped+backfill_wait
             2    active+remapped+backfilling

  io:
    client:   5.51MiB/s rd, 3.38MiB/s wr, 33op/s rd, 317op/s wr
    recovery: 60.3MiB/s, 15objects/s

MON 1 log:
   -13> 2019-01-04 14:05:04.432186 7fec56a93700  4 mgr ms_dispatch
active mgrdigest v1
   -12> 2019-01-04 14:05:04.432194 7fec56a93700  4 mgr ms_dispatch mgrdigest v1
   -11> 2019-01-04 14:05:04.822041 7fec434e1700  4 mgr[balancer]
Optimize plan auto_2019-01-04_14:05:04
   -10> 2019-01-04 14:05:04.822170 7fec434e1700  4 mgr get_config
get_configkey: mgr/balancer/mode
    -9> 2019-01-04 14:05:04.822231 7fec434e1700  4 mgr get_config
get_configkey: mgr/balancer/max_misplaced
    -8> 2019-01-04 14:05:04.822268 7fec434e1700  4 ceph_config_get
max_misplaced not found
    -7> 2019-01-04 14:05:04.822444 7fec434e1700  4 mgr[balancer] Mode
upmap, max misplaced 0.050000
    -6> 2019-01-04 14:05:04.822849 7fec434e1700  4 mgr[balancer] do_upmap
    -5> 2019-01-04 14:05:04.822923 7fec434e1700  4 mgr get_config
get_configkey: mgr/balancer/upmap_max_iterations
    -4> 2019-01-04 14:05:04.822964 7fec434e1700  4 ceph_config_get
upmap_max_iterations not found
    -3> 2019-01-04 14:05:04.823013 7fec434e1700  4 mgr get_config
get_configkey: mgr/balancer/upmap_max_deviation
    -2> 2019-01-04 14:05:04.823048 7fec434e1700  4 ceph_config_get
upmap_max_deviation not found
    -1> 2019-01-04 14:05:04.823265 7fec434e1700  4 mgr[balancer] pools
['rbd_vms_hdd', 'rbd_vms_ssd', 'rbd_vms_ssd_01', 'rbd_vms_ssd_01_ec']
     0> 2019-01-04 14:05:04.836124 7fec434e1700 -1
/build/ceph-12.2.8/src/osd/OSDMap.cc: In function 'int
OSDMap::calc_pg_upmaps(CephContext*, float, int, const std::set<long
int>&, OSDMap::Incremental*)' thread 7fec434e1700 time 2019-01-04
14:05:04.832885
/build/ceph-12.2.8/src/osd/OSDMap.cc: 4102: FAILED assert(target > 0)

 ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0)
luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x558c3c0bb572]
 2: (OSDMap::calc_pg_upmaps(CephContext*, float, int, std::set<long,
std::less<long>, std::allocator<long> > const&,
OSDMap::Incremental*)+0x2801) [0x558c3c1c0ee1]
 3: (()+0x2f3020) [0x558c3bf5d020]
 4: (PyEval_EvalFrameEx()+0x8a51) [0x7fec5e832971]
 5: (PyEval_EvalCodeEx()+0x85c) [0x7fec5e96805c]
 6: (PyEval_EvalFrameEx()+0x6ffd) [0x7fec5e830f1d]
 7: (PyEval_EvalFrameEx()+0x7124) [0x7fec5e831044]
 8: (PyEval_EvalFrameEx()+0x7124) [0x7fec5e831044]
 9: (PyEval_EvalCodeEx()+0x85c) [0x7fec5e96805c]
 10: (()+0x13e370) [0x7fec5e8be370]
 11: (PyObject_Call()+0x43) [0x7fec5e891273]
 12: (()+0x1853ac) [0x7fec5e9053ac]
 13: (PyObject_Call()+0x43) [0x7fec5e891273]
 14: (PyObject_CallMethod()+0xf4) [0x7fec5e892444]
 15: (PyModuleRunner::serve()+0x5c) [0x558c3bf5a18c]
 16: (PyModuleRunner::PyModuleRunnerThread::entry()+0x1b8) [0x558c3bf5a998]
 17: (()+0x76ba) [0x7fec5d74c6ba]
 18: (clone()+0x6d) [0x7fec5c7b841d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mgr.mon01.ceph01.srvfarm.net.log
--- end dump of recent events ---
2019-01-04 14:05:05.032479 7fec434e1700 -1 *** Caught signal (Aborted) **
 in thread 7fec434e1700 thread_name:balancer

 ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0)
luminous (stable)
 1: (()+0x4105b4) [0x558c3c07a5b4]
 2: (()+0x11390) [0x7fec5d756390]
 3: (gsignal()+0x38) [0x7fec5c6e6428]
 4: (abort()+0x16a) [0x7fec5c6e802a]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x28e) [0x558c3c0bb6fe]
 6: (OSDMap::calc_pg_upmaps(CephContext*, float, int, std::set<long,
std::less<long>, std::allocator<long> > const&,
OSDMap::Incremental*)+0x2801) [0x558c3c1c0ee1]
 7: (()+0x2f3020) [0x558c3bf5d020]
 8: (PyEval_EvalFrameEx()+0x8a51) [0x7fec5e832971]
 9: (PyEval_EvalCodeEx()+0x85c) [0x7fec5e96805c]
 10: (PyEval_EvalFrameEx()+0x6ffd) [0x7fec5e830f1d]
 11: (PyEval_EvalFrameEx()+0x7124) [0x7fec5e831044]
 12: (PyEval_EvalFrameEx()+0x7124) [0x7fec5e831044]
 13: (PyEval_EvalCodeEx()+0x85c) [0x7fec5e96805c]
 14: (()+0x13e370) [0x7fec5e8be370]
 15: (PyObject_Call()+0x43) [0x7fec5e891273]
 16: (()+0x1853ac) [0x7fec5e9053ac]
 17: (PyObject_Call()+0x43) [0x7fec5e891273]
 18: (PyObject_CallMethod()+0xf4) [0x7fec5e892444]
 19: (PyModuleRunner::serve()+0x5c) [0x558c3bf5a18c]
 20: (PyModuleRunner::PyModuleRunnerThread::entry()+0x1b8) [0x558c3bf5a998]
 21: (()+0x76ba) [0x7fec5d74c6ba]
 22: (clone()+0x6d) [0x7fec5c7b841d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- begin dump of recent events ---
     0> 2019-01-04 14:05:05.032479 7fec434e1700 -1 *** Caught signal
(Aborted) **
 in thread 7fec434e1700 thread_name:balancer

 ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0)
luminous (stable)
 1: (()+0x4105b4) [0x558c3c07a5b4]
 2: (()+0x11390) [0x7fec5d756390]
 3: (gsignal()+0x38) [0x7fec5c6e6428]
 4: (abort()+0x16a) [0x7fec5c6e802a]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x28e) [0x558c3c0bb6fe]
 6: (OSDMap::calc_pg_upmaps(CephContext*, float, int, std::set<long,
std::less<long>, std::allocator<long> > const&,
OSDMap::Incremental*)+0x2801) [0x558c3c1c0ee1]
 7: (()+0x2f3020) [0x558c3bf5d020]
 8: (PyEval_EvalFrameEx()+0x8a51) [0x7fec5e832971]
 9: (PyEval_EvalCodeEx()+0x85c) [0x7fec5e96805c]
 10: (PyEval_EvalFrameEx()+0x6ffd) [0x7fec5e830f1d]
 11: (PyEval_EvalFrameEx()+0x7124) [0x7fec5e831044]
 12: (PyEval_EvalFrameEx()+0x7124) [0x7fec5e831044]
 13: (PyEval_EvalCodeEx()+0x85c) [0x7fec5e96805c]
 14: (()+0x13e370) [0x7fec5e8be370]
 15: (PyObject_Call()+0x43) [0x7fec5e891273]
 16: (()+0x1853ac) [0x7fec5e9053ac]
 17: (PyObject_Call()+0x43) [0x7fec5e891273]
 18: (PyObject_CallMethod()+0xf4) [0x7fec5e892444]
 19: (PyModuleRunner::serve()+0x5c) [0x558c3bf5a18c]
 20: (PyModuleRunner::PyModuleRunnerThread::entry()+0x1b8) [0x558c3bf5a998]
 21: (()+0x76ba) [0x7fec5d74c6ba]
 22: (clone()+0x6d) [0x7fec5c7b841d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mgr.mon01.ceph01.srvfarm.net.log
--- end dump of recent events ---

Kevin

Am Mi., 2. Jan. 2019 um 17:35 Uhr schrieb Konstantin Shalygin <k0ste@xxxxxxxx>:
>
> On a medium sized cluster with device-classes, I am experiencing a
> problem with the SSD pool:
>
> root at adminnode:~# ceph osd df | grep ssd
> ID CLASS WEIGHT  REWEIGHT SIZE    USE     AVAIL   %USE  VAR  PGS
>  2   ssd 0.43700  1.00000  447GiB  254GiB  193GiB 56.77 1.28  50
>  3   ssd 0.43700  1.00000  447GiB  208GiB  240GiB 46.41 1.04  58
>  4   ssd 0.43700  1.00000  447GiB  266GiB  181GiB 59.44 1.34  55
> 30   ssd 0.43660  1.00000  447GiB  222GiB  225GiB 49.68 1.12  49
>  6   ssd 0.43700  1.00000  447GiB  238GiB  209GiB 53.28 1.20  59
>  7   ssd 0.43700  1.00000  447GiB  228GiB  220GiB 50.88 1.14  56
>  8   ssd 0.43700  1.00000  447GiB  269GiB  178GiB 60.16 1.35  57
> 31   ssd 0.43660  1.00000  447GiB  231GiB  217GiB 51.58 1.16  56
> 34   ssd 0.43660  1.00000  447GiB  186GiB  261GiB 41.65 0.94  49
> 36   ssd 0.87329  1.00000  894GiB  364GiB  530GiB 40.68 0.92  91
> 37   ssd 0.87329  1.00000  894GiB  321GiB  573GiB 35.95 0.81  78
> 42   ssd 0.87329  1.00000  894GiB  375GiB  519GiB 41.91 0.94  92
> 43   ssd 0.87329  1.00000  894GiB  438GiB  456GiB 49.00 1.10  92
> 13   ssd 0.43700  1.00000  447GiB  249GiB  198GiB 55.78 1.25  72
> 14   ssd 0.43700  1.00000  447GiB  290GiB  158GiB 64.76 1.46  71
> 15   ssd 0.43700  1.00000  447GiB  368GiB 78.6GiB 82.41 1.85  78 <----
> 16   ssd 0.43700  1.00000  447GiB  253GiB  194GiB 56.66 1.27  70
> 19   ssd 0.43700  1.00000  447GiB  269GiB  178GiB 60.21 1.35  70
> 20   ssd 0.43700  1.00000  447GiB  312GiB  135GiB 69.81 1.57  77
> 21   ssd 0.43700  1.00000  447GiB  312GiB  135GiB 69.77 1.57  77
> 22   ssd 0.43700  1.00000  447GiB  269GiB  178GiB 60.10 1.35  67
> 38   ssd 0.43660  1.00000  447GiB  153GiB  295GiB 34.11 0.77  46
> 39   ssd 0.43660  1.00000  447GiB  127GiB  320GiB 28.37 0.64  38
> 40   ssd 0.87329  1.00000  894GiB  386GiB  508GiB 43.17 0.97  97
> 41   ssd 0.87329  1.00000  894GiB  375GiB  520GiB 41.88 0.94 113
>
> This leads to just 1.2TB free space (some GBs away from NEAR_FULL pool).
> Currently, the balancer plugin is off because it immediately crashed
> the MGR in the past (on 12.2.5).
> Since then I upgraded to 12.2.8 but did not re-enable the balancer. [I
> am unable to find the bugtracker ID]
>
> Would the balancer plugin correct this situation?
> What happens if all MGRs die like they did on 12.2.5 because of the plugin?
> Will the balancer take data from the most-unbalanced OSDs first?
> Otherwise the OSD may fill up more then FULL which would cause the
> whole pool to freeze (because the smallest OSD is taken into account
> for free space calculation).
> This would be the worst case as over 100 VMs would freeze, causing lot
> of trouble. This is also the reason I did not try to enable the
> balancer again.
>
> Please read this [1], all about Balancer with upmap mode.
>
> It's stable from 12.2.8 with upmap mode.
>
>
>
> k
>
> [1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-December/032002.html
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com