Re: jewel10.2.11 EC pool out a osd, its PGs remap to the osds in the same host

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks so much.
my ceph osd df tree output is
here:https://gist.github.com/hnuzhoulin/e83140168eb403f4712273e3bb925a1c

just like the output, based on the reply of David:
when I out osd.132, its pg remap to just its host cld-osd12-56-sata.
it seems the out do not change Host's weight.
but if I out osd.10 which is in a replication pool, its pg remap to
its media site1-rack1-ssd, not its host cld-osd1-56-ssd. This seems
the out change Host's weight.
so the out command does diff action among firstn and indep, am I right?
if it is right, we need to reserve more available size for each disk
in indep pool.

when I try to execute ceph osd crush reweight osd.132 0.0, then its pg
remap to its media site1-rack1-ssd just like only do out in firstn,
so its Host's weight changed.
pg diffs https://gist.github.com/hnuzhoulin/aab164975b4e3d31bbecbc5c8b2f1fef
from this diff output,  I got the difference of different strategies
for selecting items OSDs in a CRUSH hierarchy.
But I can not get why out command do a different action for firstn and indep.

David Turner <drakonstein@xxxxxxxxx> 于2019年2月16日周六 上午1:22写道:
>
> I'm leaving the response on the CRUSH rule for Gregory, but you have another problem you're running into that is causing more of this data to stay on this node than you intend.  While you `out` the OSD it is still contributing to the Host's weight.  So the host is still set to receive that amount of data and distribute it among the disks inside of it.  This is the default behavior (even if you `destroy` the OSD) to minimize the data movement for losing the disk and again for adding it back into the cluster after you replace the device.  If you are really strapped for space, though, then you might consider fully purging the OSD which will reduce the Host weight to what the other OSDs are.  However if you do have a problem in your CRUSH rule, then doing this won't change anything for you.
>
> On Thu, Feb 14, 2019 at 11:15 PM hnuzhoulin2 <hnuzhoulin2@xxxxxxxxx> wrote:
>>
>> Thanks. I read the your reply in https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg48717.html
>> so using indep will do fewer data remap when osd failed.
>> using firstn: 1, 2, 3, 4, 5 -> 1, 2, 4, 5, 6 , 60% data remap
>> using indep :1, 2, 3, 4, 5 -> 1, 2, 6, 4, 5, 25% data remap
>>
>> am I right?
>> if so, what recommend to do when a disk failed and the total available size of the rest disk in the machine is not enough(can not replace failed disk immediately). or I should reserve more available size in EC situation.
>>
>> On 02/14/2019 02:49,Gregory Farnum<gfarnum@xxxxxxxxxx> wrote:
>>
>> Your CRUSH rule for EC spools is forcing that behavior with the line
>>
>> step chooseleaf indep 1 type ctnr
>>
>> If you want different behavior, you’ll need a different crush rule.
>>
>> On Tue, Feb 12, 2019 at 5:18 PM hnuzhoulin2 <hnuzhoulin2@xxxxxxxxx> wrote:
>>>
>>> Hi, cephers
>>>
>>>
>>> I am building a ceph EC cluster.when a disk is error,I out it.But its all PGs remap to the osds in the same host,which I think they should remap to other hosts in the same rack.
>>> test process is:
>>>
>>> ceph osd pool create .rgw.buckets.data 8192 8192 erasure ISA-4-2 site1_sata_erasure_ruleset 400000000
>>> ceph osd df tree|awk '{print $1" "$2" "$3" "$9" "$10}'> /tmp/1
>>> /etc/init.d/ceph stop osd.2
>>> ceph osd out 2
>>> ceph osd df tree|awk '{print $1" "$2" "$3" "$9" "$10}'> /tmp/2
>>> diff /tmp/1 /tmp/2 -y --suppress-common-lines
>>>
>>> 0 1.00000 1.00000 118 osd.0       | 0 1.00000 1.00000 126 osd.0
>>> 1 1.00000 1.00000 123 osd.1       | 1 1.00000 1.00000 139 osd.1
>>> 2 1.00000 1.00000 122 osd.2       | 2 1.00000 0 0 osd.2
>>> 3 1.00000 1.00000 113 osd.3       | 3 1.00000 1.00000 131 osd.3
>>> 4 1.00000 1.00000 122 osd.4       | 4 1.00000 1.00000 136 osd.4
>>> 5 1.00000 1.00000 112 osd.5       | 5 1.00000 1.00000 127 osd.5
>>> 6 1.00000 1.00000 114 osd.6       | 6 1.00000 1.00000 128 osd.6
>>> 7 1.00000 1.00000 124 osd.7       | 7 1.00000 1.00000 136 osd.7
>>> 8 1.00000 1.00000 95 osd.8       | 8 1.00000 1.00000 113 osd.8
>>> 9 1.00000 1.00000 112 osd.9       | 9 1.00000 1.00000 119 osd.9
>>> TOTAL 3073T 197G         | TOTAL 3065T 197G
>>> MIN/MAX VAR: 0.84/26.56         | MIN/MAX VAR: 0.84/26.52
>>>
>>>
>>> some config info: (detail configs see: https://gist.github.com/hnuzhoulin/575883dbbcb04dff448eea3b9384c125);
>>> jewel 10.2.11  filestore+rocksdb
>>>
>>> ceph osd erasure-code-profile get ISA-4-2
>>> k=4
>>> m=2
>>> plugin=isa
>>> ruleset-failure-domain=ctnr
>>> ruleset-root=site1-sata
>>> technique=reed_sol_van
>>>
>>> part of ceph.conf is:
>>>
>>> [global]
>>> fsid = 1CAB340D-E551-474F-B21A-399AC0F10900
>>> auth cluster required = cephx
>>> auth service required = cephx
>>> auth client required = cephx
>>> pid file = /home/ceph/var/run/$name.pid
>>> log file = /home/ceph/log/$cluster-$name.log
>>> mon osd nearfull ratio = 0.85
>>> mon osd full ratio = 0.95
>>> admin socket = /home/ceph/var/run/$cluster-$name.asok
>>> osd pool default size = 3
>>> osd pool default min size = 1
>>> osd objectstore = filestore
>>> filestore merge threshold = -10
>>>
>>> [mon]
>>> keyring = /home/ceph/var/lib/$type/$cluster-$id/keyring
>>> mon data = /home/ceph/var/lib/$type/$cluster-$id
>>> mon cluster log file = /home/ceph/log/$cluster.log
>>> [osd]
>>> keyring = /home/ceph/var/lib/$type/$cluster-$id/keyring
>>> osd data = /home/ceph/var/lib/$type/$cluster-$id
>>> osd journal = /home/ceph/var/lib/$type/$cluster-$id/journal
>>> osd journal size = 10000
>>> osd mkfs type = xfs
>>> osd mount options xfs = rw,noatime,nodiratime,inode64,logbsize=256k
>>> osd backfill full ratio = 0.92
>>> osd failsafe full ratio = 0.95
>>> osd failsafe nearfull ratio = 0.85
>>> osd max backfills = 1
>>> osd crush update on start = false
>>> osd op thread timeout = 60
>>> filestore split multiple = 8
>>> filestore max sync interval = 15
>>> filestore min sync interval = 5
>>> [osd.0]
>>> host = cld-osd1-56
>>> addr = XXXXX
>>> user = ceph
>>> devs = /disk/link/osd-0/data
>>> osd journal = /disk/link/osd-0/journal
>>> …….
>>> [osd.503]
>>> host = cld-osd42-56
>>> addr = 10.108.87.52
>>> user = ceph
>>> devs = /disk/link/osd-503/data
>>> osd journal = /disk/link/osd-503/journal
>>>
>>>
>>> crushmap is below:
>>>
>>> # begin crush map
>>> tunable choose_local_tries 0
>>> tunable choose_local_fallback_tries 0
>>> tunable choose_total_tries 50
>>> tunable chooseleaf_descend_once 1
>>> tunable chooseleaf_vary_r 1
>>> tunable straw_calc_version 1
>>> tunable allowed_bucket_algs 54
>>>
>>> # devices
>>> device 0 osd.0
>>> device 1 osd.1
>>> device 2 osd.2
>>> 。。。
>>> device 502 osd.502
>>> device 503 osd.503
>>>
>>> # types
>>> type 0 osd          # osd
>>> type 1 ctnr         # sata/ssd group by node, -101~1xx/-201~2xx
>>> type 2 media        # sata/ssd group by rack, -11~1x/-21~2x
>>> type 3 mediagroup   # sata/ssd group by site, -5/-6
>>> type 4 unit         # site, -2
>>> type 5 root         # root, -1
>>>
>>> # buckets
>>> ctnr cld-osd1-56-sata {
>>> id -101              # do not change unnecessarily
>>> # weight 10.000
>>> alg straw2
>>> hash 0               # rjenkins1
>>> item osd.0 weight 1.000
>>> item osd.1 weight 1.000
>>> item osd.2 weight 1.000
>>> item osd.3 weight 1.000
>>> item osd.4 weight 1.000
>>> item osd.5 weight 1.000
>>> item osd.6 weight 1.000
>>> item osd.7 weight 1.000
>>> item osd.8 weight 1.000
>>> item osd.9 weight 1.000
>>> }
>>> ctnr cld-osd1-56-ssd {
>>> id -201              # do not change unnecessarily
>>> # weight 2.000
>>> alg straw2
>>> hash 0               # rjenkins1
>>> item osd.10 weight 1.000
>>> item osd.11 weight 1.000
>>> }
>>> …..
>>> ctnr cld-osd41-56-sata {
>>> id -141              # do not change unnecessarily
>>> # weight 10.000
>>> alg straw2
>>> hash 0               # rjenkins1
>>> item osd.480 weight 1.000
>>> item osd.481 weight 1.000
>>> item osd.482 weight 1.000
>>> item osd.483 weight 1.000
>>> item osd.484 weight 1.000
>>> item osd.485 weight 1.000
>>> item osd.486 weight 1.000
>>> item osd.487 weight 1.000
>>> item osd.488 weight 1.000
>>> item osd.489 weight 1.000
>>> }
>>> ctnr cld-osd41-56-ssd {
>>> id -241              # do not change unnecessarily
>>> # weight 2.000
>>> alg straw2
>>> hash 0               # rjenkins1
>>> item osd.490 weight 1.000
>>> item osd.491 weight 1.000
>>> }
>>> ctnr cld-osd42-56-sata {
>>> id -142              # do not change unnecessarily
>>> # weight 10.000
>>> alg straw2
>>> hash 0               # rjenkins1
>>> item cld-osd29-56-sata weight 10.000
>>> item cld-osd30-56-sata weight 10.000
>>> item cld-osd31-56-sata weight 10.000
>>> item cld-osd32-56-sata weight 10.000
>>> item cld-osd33-56-sata weight 10.000
>>> item cld-osd34-56-sata weight 10.000
>>> item cld-osd35-56-sata weight 10.000
>>> }
>>>
>>>
>>> media site1-rack1-sata {
>>> id -11               # do not change unnecessarily
>>> # weight 70.000
>>> alg straw2
>>> hash 0               # rjenkins1
>>> item cld-osd1-56-sata weight 10.000
>>> item cld-osd2-56-sata weight 10.000
>>> item cld-osd3-56-sata weight 10.000
>>> item cld-osd4-56-sata weight 10.000
>>> item cld-osd5-56-sata weight 10.000
>>> item cld-osd6-56-sata weight 10.000
>>> item cld-osd7-56-sata weight 10.000
>>> }
>>> media site1-rack2-sata {
>>> id -12               # do not change unnecessarily
>>> # weight 70.000
>>> alg straw2
>>> hash 0               # rjenkins1
>>> item cld-osd8-56-sata weight 10.000
>>> item cld-osd9-56-sata weight 10.000
>>> item cld-osd10-56-sata weight 10.000
>>> item cld-osd11-56-sata weight 10.000
>>> item cld-osd12-56-sata weight 10.000
>>> item cld-osd13-56-sata weight 10.000
>>> item cld-osd14-56-sata weight 10.000
>>> }
>>> media site1-rack3-sata {
>>> id -13               # do not change unnecessarily
>>> # weight 70.000
>>> alg straw2
>>> hash 0               # rjenkins1
>>> item cld-osd15-56-sata weight 10.000
>>> item cld-osd16-56-sata weight 10.000
>>> item cld-osd17-56-sata weight 10.000
>>> item cld-osd18-56-sata weight 10.000
>>> item cld-osd19-56-sata weight 10.000
>>> item cld-osd20-56-sata weight 10.000
>>> item cld-osd21-56-sata weight 10.000
>>> }
>>> media site1-rack4-sata {
>>> id -14               # do not change unnecessarily
>>> # weight 70.000
>>> alg straw2
>>> hash 0               # rjenkins1
>>> item cld-osd22-56-sata weight 10.000
>>> item cld-osd23-56-sata weight 10.000
>>> item cld-osd24-56-sata weight 10.000
>>> item cld-osd25-56-sata weight 10.000
>>> item cld-osd26-56-sata weight 10.000
>>> item cld-osd27-56-sata weight 10.000
>>> item cld-osd28-56-sata weight 10.000
>>> }
>>> media site1-rack5-sata {
>>> id -15               # do not change unnecessarily
>>> # weight 70.000
>>> alg straw2
>>> hash 0               # rjenkins1
>>> item cld-osd29-56-sata weight 10.000
>>> item cld-osd30-56-sata weight 10.000
>>> item cld-osd31-56-sata weight 10.000
>>> item cld-osd32-56-sata weight 10.000
>>> item cld-osd33-56-sata weight 10.000
>>> item cld-osd34-56-sata weight 10.000
>>> item cld-osd35-56-sata weight 10.000
>>> }
>>> media site1-rack6-sata {
>>> id -16               # do not change unnecessarily
>>> # weight 70.000
>>> alg straw2
>>> hash 0               # rjenkins1
>>> item cld-osd36-56-sata weight 10.000
>>> item cld-osd37-56-sata weight 10.000
>>> item cld-osd38-56-sata weight 10.000
>>> item cld-osd39-56-sata weight 10.000
>>> item cld-osd40-56-sata weight 10.000
>>> item cld-osd41-56-sata weight 10.000
>>> item cld-osd42-56-sata weight 10.000
>>> }
>>>
>>> media site1-rack1-ssd {
>>> id -21               # do not change unnecessarily
>>> # weight 14.000
>>> alg straw2
>>> hash 0               # rjenkins1
>>> item cld-osd1-56-ssd weight 2.000
>>> item cld-osd2-56-ssd weight 2.000
>>> item cld-osd3-56-ssd weight 2.000
>>> item cld-osd4-56-ssd weight 2.000
>>> item cld-osd5-56-ssd weight 2.000
>>> item cld-osd6-56-ssd weight 2.000
>>> item cld-osd7-56-ssd weight 2.000
>>> item cld-osd8-56-ssd weight 2.000
>>> item cld-osd9-56-ssd weight 2.000
>>> item cld-osd10-56-ssd weight 2.000
>>> item cld-osd11-56-ssd weight 2.000
>>> item cld-osd12-56-ssd weight 2.000
>>> item cld-osd13-56-ssd weight 2.000
>>> item cld-osd14-56-ssd weight 2.000
>>> }
>>> media site1-rack2-ssd {
>>> id -22               # do not change unnecessarily
>>> # weight 14.000
>>> alg straw2
>>> hash 0               # rjenkins1
>>> item cld-osd15-56-ssd weight 2.000
>>> item cld-osd16-56-ssd weight 2.000
>>> item cld-osd17-56-ssd weight 2.000
>>> item cld-osd18-56-ssd weight 2.000
>>> item cld-osd19-56-ssd weight 2.000
>>> item cld-osd20-56-ssd weight 2.000
>>> item cld-osd21-56-ssd weight 2.000
>>> item cld-osd22-56-ssd weight 2.000
>>> item cld-osd23-56-ssd weight 2.000
>>> item cld-osd24-56-ssd weight 2.000
>>> item cld-osd25-56-ssd weight 2.000
>>> item cld-osd26-56-ssd weight 2.000
>>> item cld-osd27-56-ssd weight 2.000
>>> item cld-osd28-56-ssd weight 2.000
>>> }
>>> media site1-rack3-ssd {
>>> id -23               # do not change unnecessarily
>>> # weight 14.000
>>> alg straw2
>>> hash 0               # rjenkins1
>>> item cld-osd29-56-ssd weight 2.000
>>> item cld-osd30-56-ssd weight 2.000
>>> item cld-osd31-56-ssd weight 2.000
>>> item cld-osd32-56-ssd weight 2.000
>>> item cld-osd33-56-ssd weight 2.000
>>> item cld-osd34-56-ssd weight 2.000
>>> item cld-osd35-56-ssd weight 2.000
>>> item cld-osd36-56-ssd weight 2.000
>>> item cld-osd37-56-ssd weight 2.000
>>> item cld-osd38-56-ssd weight 2.000
>>> item cld-osd39-56-ssd weight 2.000
>>> item cld-osd40-56-ssd weight 2.000
>>> item cld-osd41-56-ssd weight 2.000
>>> item cld-osd42-56-ssd weight 2.000
>>> }
>>> mediagroup site1-sata {
>>> id -5                # do not change unnecessarily
>>> # weight 420.000
>>> alg straw2
>>> hash 0               # rjenkins1
>>> item site1-rack1-sata weight 70.000
>>> item site1-rack2-sata weight 70.000
>>> item site1-rack3-sata weight 70.000
>>> item site1-rack4-sata weight 70.000
>>> item site1-rack5-sata weight 70.000
>>> item site1-rack6-sata weight 70.000
>>> }
>>> mediagroup site1-ssd {
>>> id -6                # do not change unnecessarily
>>> # weight 84.000
>>> alg straw2
>>> hash 0               # rjenkins1
>>> item site1-rack1-ssd weight 28.000
>>> item site1-rack2-ssd weight 28.000
>>> item site1-rack3-ssd weight 28.000
>>> }
>>>
>>> unit site1 {
>>> id -2                # do not change unnecessarily
>>> # weight 504.000
>>> alg straw2
>>> hash 0               # rjenkins1
>>> item site1-sata weight 420.000
>>> item site1-ssd weight 84.000
>>> }
>>>
>>> root default {
>>> id -1                # do not change unnecessarily
>>> # weight 504.000
>>> alg straw2
>>> hash 0               # rjenkins1
>>> item site1 weight 504.000
>>> }
>>> # rules
>>> rule site1_sata_erasure_ruleset {
>>> ruleset 0
>>> type erasure
>>> min_size 3
>>> max_size 6
>>> step set_chooseleaf_tries 5
>>> step set_choose_tries 100
>>> step take site1-sata
>>> step choose indep 0 type media
>>> step chooseleaf indep 1 type ctnr
>>> step emit
>>> }
>>> rule site1_ssd_replicated_ruleset {
>>> ruleset 1
>>> type replicated
>>> min_size 1
>>> max_size 10
>>> step take site1-ssd
>>> step choose firstn 0 type media
>>> step chooseleaf firstn 1 type ctnr
>>> step emit
>>> }
>>> # end crush map
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux