Re: size of inc_osdmap vs osdmap

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We investigated the issue and set debug_mon up to 20 during little change of osdmap get many messages for all pgs of each pool (for all cluster):
2018-12-25 19:28:42.426776 7f075af7d700 20 mon.1@0(leader).osd e1373789 prime_pg_tempnext_up === next_acting now, clear pg_temp
2018-12-25 19:28:42.426776 7f075a77c700 20 mon.1@0(leader).osd e1373789 prime_pg_tempnext_up === next_acting now, clear pg_temp
2018-12-25 19:28:42.426777 7f075977a700 20 mon.1@0(leader).osd e1373789 prime_pg_tempnext_up === next_acting now, clear pg_temp
2018-12-25 19:28:42.426779 7f075af7d700 20 mon.1@0(leader).osd e1373789 prime_pg_temp 3.1000 [97,812,841]/[] -> [97,812,841]/[97,812,841], priming []
2018-12-25 19:28:42.426780 7f075a77c700 20 mon.1@0(leader).osd e1373789 prime_pg_temp 3.0 [84,370,847]/[] -> [84,370,847]/[84,370,847], priming []
2018-12-25 19:28:42.426781 7f075977a700 20 mon.1@0(leader).osd e1373789 prime_pg_temp 4.0 [404,857,11]/[] -> [404,857,11]/[404,857,11], priming []
though no pg_temps are created as result(no single backfill)


i accept that we may be mistaken 


On Wed, Dec 12, 2018 at 10:53 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
Hmm that does seem odd. How are you looking at those sizes?

On Wed, Dec 12, 2018 at 4:38 AM Sergey Dolgov <palza00@xxxxxxxxx> wrote:
Greq, for example for our cluster ~1000 osd:

size osdmap.1357881__0_F7FE779D__none = 363KB (crush_version 9860,
modified 2018-12-12 04:00:17.661731)
size osdmap.1357882__0_F7FE772D__none = 363KB
size osdmap.1357883__0_F7FE74FD__none = 363KB (crush_version 9861,
modified 2018-12-12 04:00:27.385702)
size inc_osdmap.1357882__0_B783A4EA__none = 1.2MB

difference between epoch 1357881 and 1357883: crush weight one osd was
increased by 0.01 so we get 5 new pg_temp in osdmap.1357883 but size
inc_osdmap so huge

чт, 6 дек. 2018 г. в 06:20, Gregory Farnum <gfarnum@xxxxxxxxxx>:
>
> On Wed, Dec 5, 2018 at 3:32 PM Sergey Dolgov <palza00@xxxxxxxxx> wrote:
>>
>> Hi guys
>>
>> I faced strange behavior of crushmap change. When I change crush
>> weight osd I sometimes get  increment osdmap(1.2MB) which size is
>> significantly bigger than size of osdmap(0.4MB)
>
>
> This is probably because when CRUSH changes, the new primary OSDs for a PG will tend to set a "pg temp" value (in the OSDMap) that temporarily reassigns it to the old acting set, so the data can be accessed while the new OSDs get backfilled. Depending on the size of your cluster, the number of PGs on it, and the size of the CRUSH change, this can easily be larger than the rest of the map because it is data with size linear in the number of PGs affected, instead of being more normally proportional to the number of OSDs.
> -Greg
>
>>
>> I use luminois 12.2.8. Cluster was installed a long ago, I suppose
>> that initially it was firefly
>> How can I view content of increment osdmap or can you give me opinion
>> on this problem. I think that spikes of traffic tight after change of
>> crushmap relates to this crushmap behavior
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Best regards, Sergey Dolgov


--
Best regards, Sergey Dolgov
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux