Re: Usage of devices in SSD pool vary very much

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



PS: Could be http://tracker.ceph.com/issues/36361
There is one HDD OSD that is out (which will not be replaced because
the SSD pool will get the images and the hdd pool will be deleted).

Kevin

Am Fr., 4. Jan. 2019 um 19:46 Uhr schrieb Kevin Olbrich <ko@xxxxxxx>:
>
> Hi!
>
> I did what you wrote but my MGRs started to crash again:
> root@adminnode:~# ceph -s
>   cluster:
>     id:     086d9f80-6249-4594-92d0-e31b6aaaaa9c
>     health: HEALTH_WARN
>             no active mgr
>             105498/6277782 objects misplaced (1.680%)
>
>   services:
>     mon: 3 daemons, quorum mon01,mon02,mon03
>     mgr: no daemons active
>     osd: 44 osds: 43 up, 43 in
>
>   data:
>     pools:   4 pools, 1616 pgs
>     objects: 1.88M objects, 7.07TiB
>     usage:   13.2TiB used, 16.7TiB / 29.9TiB avail
>     pgs:     105498/6277782 objects misplaced (1.680%)
>              1606 active+clean
>              8    active+remapped+backfill_wait
>              2    active+remapped+backfilling
>
>   io:
>     client:   5.51MiB/s rd, 3.38MiB/s wr, 33op/s rd, 317op/s wr
>     recovery: 60.3MiB/s, 15objects/s
>
>
> MON 1 log:
>    -13> 2019-01-04 14:05:04.432186 7fec56a93700  4 mgr ms_dispatch
> active mgrdigest v1
>    -12> 2019-01-04 14:05:04.432194 7fec56a93700  4 mgr ms_dispatch mgrdigest v1
>    -11> 2019-01-04 14:05:04.822041 7fec434e1700  4 mgr[balancer]
> Optimize plan auto_2019-01-04_14:05:04
>    -10> 2019-01-04 14:05:04.822170 7fec434e1700  4 mgr get_config
> get_configkey: mgr/balancer/mode
>     -9> 2019-01-04 14:05:04.822231 7fec434e1700  4 mgr get_config
> get_configkey: mgr/balancer/max_misplaced
>     -8> 2019-01-04 14:05:04.822268 7fec434e1700  4 ceph_config_get
> max_misplaced not found
>     -7> 2019-01-04 14:05:04.822444 7fec434e1700  4 mgr[balancer] Mode
> upmap, max misplaced 0.050000
>     -6> 2019-01-04 14:05:04.822849 7fec434e1700  4 mgr[balancer] do_upmap
>     -5> 2019-01-04 14:05:04.822923 7fec434e1700  4 mgr get_config
> get_configkey: mgr/balancer/upmap_max_iterations
>     -4> 2019-01-04 14:05:04.822964 7fec434e1700  4 ceph_config_get
> upmap_max_iterations not found
>     -3> 2019-01-04 14:05:04.823013 7fec434e1700  4 mgr get_config
> get_configkey: mgr/balancer/upmap_max_deviation
>     -2> 2019-01-04 14:05:04.823048 7fec434e1700  4 ceph_config_get
> upmap_max_deviation not found
>     -1> 2019-01-04 14:05:04.823265 7fec434e1700  4 mgr[balancer] pools
> ['rbd_vms_hdd', 'rbd_vms_ssd', 'rbd_vms_ssd_01', 'rbd_vms_ssd_01_ec']
>      0> 2019-01-04 14:05:04.836124 7fec434e1700 -1
> /build/ceph-12.2.8/src/osd/OSDMap.cc: In function 'int
> OSDMap::calc_pg_upmaps(CephContext*, float, int, const std::set<long
> int>&, OSDMap::Incremental*)' thread 7fec434e1700 time 2019-01-04
> 14:05:04.832885
> /build/ceph-12.2.8/src/osd/OSDMap.cc: 4102: FAILED assert(target > 0)
>
>  ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0)
> luminous (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x102) [0x558c3c0bb572]
>  2: (OSDMap::calc_pg_upmaps(CephContext*, float, int, std::set<long,
> std::less<long>, std::allocator<long> > const&,
> OSDMap::Incremental*)+0x2801) [0x558c3c1c0ee1]
>  3: (()+0x2f3020) [0x558c3bf5d020]
>  4: (PyEval_EvalFrameEx()+0x8a51) [0x7fec5e832971]
>  5: (PyEval_EvalCodeEx()+0x85c) [0x7fec5e96805c]
>  6: (PyEval_EvalFrameEx()+0x6ffd) [0x7fec5e830f1d]
>  7: (PyEval_EvalFrameEx()+0x7124) [0x7fec5e831044]
>  8: (PyEval_EvalFrameEx()+0x7124) [0x7fec5e831044]
>  9: (PyEval_EvalCodeEx()+0x85c) [0x7fec5e96805c]
>  10: (()+0x13e370) [0x7fec5e8be370]
>  11: (PyObject_Call()+0x43) [0x7fec5e891273]
>  12: (()+0x1853ac) [0x7fec5e9053ac]
>  13: (PyObject_Call()+0x43) [0x7fec5e891273]
>  14: (PyObject_CallMethod()+0xf4) [0x7fec5e892444]
>  15: (PyModuleRunner::serve()+0x5c) [0x558c3bf5a18c]
>  16: (PyModuleRunner::PyModuleRunnerThread::entry()+0x1b8) [0x558c3bf5a998]
>  17: (()+0x76ba) [0x7fec5d74c6ba]
>  18: (clone()+0x6d) [0x7fec5c7b841d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 rbd_mirror
>    0/ 5 rbd_replay
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    1/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    1/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 1 reserver
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/10 civetweb
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>    0/ 0 refs
>    1/ 5 xio
>    1/ 5 compressor
>    1/ 5 bluestore
>    1/ 5 bluefs
>    1/ 3 bdev
>    1/ 5 kstore
>    4/ 5 rocksdb
>    4/ 5 leveldb
>    4/ 5 memdb
>    1/ 5 kinetic
>    1/ 5 fuse
>    1/ 5 mgr
>    1/ 5 mgrc
>    1/ 5 dpdk
>    1/ 5 eventtrace
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-mgr.mon01.ceph01.srvfarm.net.log
> --- end dump of recent events ---
> 2019-01-04 14:05:05.032479 7fec434e1700 -1 *** Caught signal (Aborted) **
>  in thread 7fec434e1700 thread_name:balancer
>
>  ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0)
> luminous (stable)
>  1: (()+0x4105b4) [0x558c3c07a5b4]
>  2: (()+0x11390) [0x7fec5d756390]
>  3: (gsignal()+0x38) [0x7fec5c6e6428]
>  4: (abort()+0x16a) [0x7fec5c6e802a]
>  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x28e) [0x558c3c0bb6fe]
>  6: (OSDMap::calc_pg_upmaps(CephContext*, float, int, std::set<long,
> std::less<long>, std::allocator<long> > const&,
> OSDMap::Incremental*)+0x2801) [0x558c3c1c0ee1]
>  7: (()+0x2f3020) [0x558c3bf5d020]
>  8: (PyEval_EvalFrameEx()+0x8a51) [0x7fec5e832971]
>  9: (PyEval_EvalCodeEx()+0x85c) [0x7fec5e96805c]
>  10: (PyEval_EvalFrameEx()+0x6ffd) [0x7fec5e830f1d]
>  11: (PyEval_EvalFrameEx()+0x7124) [0x7fec5e831044]
>  12: (PyEval_EvalFrameEx()+0x7124) [0x7fec5e831044]
>  13: (PyEval_EvalCodeEx()+0x85c) [0x7fec5e96805c]
>  14: (()+0x13e370) [0x7fec5e8be370]
>  15: (PyObject_Call()+0x43) [0x7fec5e891273]
>  16: (()+0x1853ac) [0x7fec5e9053ac]
>  17: (PyObject_Call()+0x43) [0x7fec5e891273]
>  18: (PyObject_CallMethod()+0xf4) [0x7fec5e892444]
>  19: (PyModuleRunner::serve()+0x5c) [0x558c3bf5a18c]
>  20: (PyModuleRunner::PyModuleRunnerThread::entry()+0x1b8) [0x558c3bf5a998]
>  21: (()+0x76ba) [0x7fec5d74c6ba]
>  22: (clone()+0x6d) [0x7fec5c7b841d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- begin dump of recent events ---
>      0> 2019-01-04 14:05:05.032479 7fec434e1700 -1 *** Caught signal
> (Aborted) **
>  in thread 7fec434e1700 thread_name:balancer
>
>  ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0)
> luminous (stable)
>  1: (()+0x4105b4) [0x558c3c07a5b4]
>  2: (()+0x11390) [0x7fec5d756390]
>  3: (gsignal()+0x38) [0x7fec5c6e6428]
>  4: (abort()+0x16a) [0x7fec5c6e802a]
>  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x28e) [0x558c3c0bb6fe]
>  6: (OSDMap::calc_pg_upmaps(CephContext*, float, int, std::set<long,
> std::less<long>, std::allocator<long> > const&,
> OSDMap::Incremental*)+0x2801) [0x558c3c1c0ee1]
>  7: (()+0x2f3020) [0x558c3bf5d020]
>  8: (PyEval_EvalFrameEx()+0x8a51) [0x7fec5e832971]
>  9: (PyEval_EvalCodeEx()+0x85c) [0x7fec5e96805c]
>  10: (PyEval_EvalFrameEx()+0x6ffd) [0x7fec5e830f1d]
>  11: (PyEval_EvalFrameEx()+0x7124) [0x7fec5e831044]
>  12: (PyEval_EvalFrameEx()+0x7124) [0x7fec5e831044]
>  13: (PyEval_EvalCodeEx()+0x85c) [0x7fec5e96805c]
>  14: (()+0x13e370) [0x7fec5e8be370]
>  15: (PyObject_Call()+0x43) [0x7fec5e891273]
>  16: (()+0x1853ac) [0x7fec5e9053ac]
>  17: (PyObject_Call()+0x43) [0x7fec5e891273]
>  18: (PyObject_CallMethod()+0xf4) [0x7fec5e892444]
>  19: (PyModuleRunner::serve()+0x5c) [0x558c3bf5a18c]
>  20: (PyModuleRunner::PyModuleRunnerThread::entry()+0x1b8) [0x558c3bf5a998]
>  21: (()+0x76ba) [0x7fec5d74c6ba]
>  22: (clone()+0x6d) [0x7fec5c7b841d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 rbd_mirror
>    0/ 5 rbd_replay
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    1/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    1/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 1 reserver
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/10 civetweb
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>    0/ 0 refs
>    1/ 5 xio
>    1/ 5 compressor
>    1/ 5 bluestore
>    1/ 5 bluefs
>    1/ 3 bdev
>    1/ 5 kstore
>    4/ 5 rocksdb
>    4/ 5 leveldb
>    4/ 5 memdb
>    1/ 5 kinetic
>    1/ 5 fuse
>    1/ 5 mgr
>    1/ 5 mgrc
>    1/ 5 dpdk
>    1/ 5 eventtrace
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-mgr.mon01.ceph01.srvfarm.net.log
> --- end dump of recent events ---
>
>
>
> Kevin
>
>
> Am Mi., 2. Jan. 2019 um 17:35 Uhr schrieb Konstantin Shalygin <k0ste@xxxxxxxx>:
> >
> > On a medium sized cluster with device-classes, I am experiencing a
> > problem with the SSD pool:
> >
> > root at adminnode:~# ceph osd df | grep ssd
> > ID CLASS WEIGHT  REWEIGHT SIZE    USE     AVAIL   %USE  VAR  PGS
> >  2   ssd 0.43700  1.00000  447GiB  254GiB  193GiB 56.77 1.28  50
> >  3   ssd 0.43700  1.00000  447GiB  208GiB  240GiB 46.41 1.04  58
> >  4   ssd 0.43700  1.00000  447GiB  266GiB  181GiB 59.44 1.34  55
> > 30   ssd 0.43660  1.00000  447GiB  222GiB  225GiB 49.68 1.12  49
> >  6   ssd 0.43700  1.00000  447GiB  238GiB  209GiB 53.28 1.20  59
> >  7   ssd 0.43700  1.00000  447GiB  228GiB  220GiB 50.88 1.14  56
> >  8   ssd 0.43700  1.00000  447GiB  269GiB  178GiB 60.16 1.35  57
> > 31   ssd 0.43660  1.00000  447GiB  231GiB  217GiB 51.58 1.16  56
> > 34   ssd 0.43660  1.00000  447GiB  186GiB  261GiB 41.65 0.94  49
> > 36   ssd 0.87329  1.00000  894GiB  364GiB  530GiB 40.68 0.92  91
> > 37   ssd 0.87329  1.00000  894GiB  321GiB  573GiB 35.95 0.81  78
> > 42   ssd 0.87329  1.00000  894GiB  375GiB  519GiB 41.91 0.94  92
> > 43   ssd 0.87329  1.00000  894GiB  438GiB  456GiB 49.00 1.10  92
> > 13   ssd 0.43700  1.00000  447GiB  249GiB  198GiB 55.78 1.25  72
> > 14   ssd 0.43700  1.00000  447GiB  290GiB  158GiB 64.76 1.46  71
> > 15   ssd 0.43700  1.00000  447GiB  368GiB 78.6GiB 82.41 1.85  78 <----
> > 16   ssd 0.43700  1.00000  447GiB  253GiB  194GiB 56.66 1.27  70
> > 19   ssd 0.43700  1.00000  447GiB  269GiB  178GiB 60.21 1.35  70
> > 20   ssd 0.43700  1.00000  447GiB  312GiB  135GiB 69.81 1.57  77
> > 21   ssd 0.43700  1.00000  447GiB  312GiB  135GiB 69.77 1.57  77
> > 22   ssd 0.43700  1.00000  447GiB  269GiB  178GiB 60.10 1.35  67
> > 38   ssd 0.43660  1.00000  447GiB  153GiB  295GiB 34.11 0.77  46
> > 39   ssd 0.43660  1.00000  447GiB  127GiB  320GiB 28.37 0.64  38
> > 40   ssd 0.87329  1.00000  894GiB  386GiB  508GiB 43.17 0.97  97
> > 41   ssd 0.87329  1.00000  894GiB  375GiB  520GiB 41.88 0.94 113
> >
> > This leads to just 1.2TB free space (some GBs away from NEAR_FULL pool).
> > Currently, the balancer plugin is off because it immediately crashed
> > the MGR in the past (on 12.2.5).
> > Since then I upgraded to 12.2.8 but did not re-enable the balancer. [I
> > am unable to find the bugtracker ID]
> >
> > Would the balancer plugin correct this situation?
> > What happens if all MGRs die like they did on 12.2.5 because of the plugin?
> > Will the balancer take data from the most-unbalanced OSDs first?
> > Otherwise the OSD may fill up more then FULL which would cause the
> > whole pool to freeze (because the smallest OSD is taken into account
> > for free space calculation).
> > This would be the worst case as over 100 VMs would freeze, causing lot
> > of trouble. This is also the reason I did not try to enable the
> > balancer again.
> >
> > Please read this [1], all about Balancer with upmap mode.
> >
> > It's stable from 12.2.8 with upmap mode.
> >
> >
> >
> > k
> >
> > [1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-December/032002.html
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux