Hi Igor Many thanks, it worked! ewceph1-osd001-prod:~ # egrep -a --color=always "min_alloc_size" /var/log/ceph/ceph-osd.0.log | tail -111 2022-02-10 18:12:53.918 7f3a1dd4bd00 10 bluestore(/var/lib/ceph/osd/ceph-0) _open_super_meta min_alloc_size 0x10000 2022-02-10 18:12:53.926 7f3a1dd4bd00 10 bluestore(/var/lib/ceph/osd/ceph-0) _set_alloc_sizes min_alloc_size 0x10000 order 16 max_alloc_size 0x0 prefer_deferred_size 0x8000 deferred_batch_ops 64 ewceph1-osd001-prod:~ # echo $((16#10000)) 65536 So I get 64K for hdd and 16K for nvme. I will recreate the nvme osd's with 4K to avoid any allocation overhead issue with EC8+2. Cheers Francois -- EveryWare AG François Scheurer Senior Systems Engineer Zurlindenstrasse 52a CH-8003 Zürich tel: +41 44 466 60 00 fax: +41 44 466 60 10 mail: francois.scheurer@xxxxxxxxxxxx web: http://www.everyware.ch ________________________________ From: Igor Fedotov <igor.fedotov@xxxxxxxx> Sent: Thursday, February 10, 2022 6:06 PM To: Scheurer François; Dan van der Ster Cc: Ceph Users Subject: Re: Re: osd true blocksize vs bluestore_min_alloc_size Hi Fransois, you should set debug_bluestore = 10 instead. And then grep for bluestore or min_alloc_size not bluefs, here is how this is printed: dout(10) << __func__ << " min_alloc_size 0x" << std::hex << min_alloc_size << std::dec << " order " << (int)min_alloc_size_order << " max_alloc_size 0x" << std::hex << max_alloc_size << " prefer_deferred_size 0x" << prefer_deferred_size << std::dec << " deferred_batch_ops " << deferred_batch_ops << dendl; On 2/10/2022 7:39 PM, Scheurer François wrote: > Dear Dan > > > Thank you for your help. > > After putting debug_osd = 10/5 in ceph.conf under [osd], I still do not get min_alloc_size logged. > > Probably no logging it on 14.2.5. > > But this come up: > > ewceph1-osd001-prod:~ # egrep -a --color=always bluefs /var/log/ceph/ceph-osd.0.log | tail -111 > 2022-02-10 17:26:59.512 7f6026737d00 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 80 GiB > > 2022-02-10 17:26:59.512 7f6026737d00 1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-0/block size 7.3 TiB > 2022-02-10 17:26:59.512 7f6026737d00 1 bluefs add_block_device bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 2 GiB > 2022-02-10 17:27:00.896 7f6026737d00 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 80 GiB > 2022-02-10 17:27:00.900 7f6026737d00 1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-0/block size 7.3 TiB > 2022-02-10 17:27:00.900 7f6026737d00 1 bluefs add_block_device bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 2 GiB > 2022-02-10 17:27:00.900 7f6026737d00 1 bluefs mount > 2022-02-10 17:27:00.900 7f6026737d00 1 bluefs _init_alloc id 0 alloc_size 0x100000 size 0x80000000 > 2022-02-10 17:27:00.900 7f6026737d00 1 bluefs _init_alloc id 1 alloc_size 0x100000 size 0x1400000000 > 2022-02-10 17:27:00.900 7f6026737d00 1 bluefs _init_alloc id 2 alloc_size 0x10000 size 0x746fc051000 > 2022-02-10 17:27:04.516 7f6026737d00 1 bluefs umount > 2022-02-10 17:27:05.200 7f6026737d00 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 80 GiB > 2022-02-10 17:27:05.200 7f6026737d00 1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-0/block size 7.3 TiB > 2022-02-10 17:27:05.200 7f6026737d00 1 bluefs add_block_device bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 2 GiB > 2022-02-10 17:27:05.200 7f6026737d00 1 bluefs mount > 2022-02-10 17:27:05.200 7f6026737d00 1 bluefs _init_alloc id 0 alloc_size 0x100000 size 0x80000000 > 2022-02-10 17:27:05.200 7f6026737d00 1 bluefs _init_alloc id 1 alloc_size 0x100000 size 0x1400000000 > 2022-02-10 17:27:05.200 7f6026737d00 1 bluefs _init_alloc id 2 alloc_size 0x10000 size 0x746fc051000 > > So alloc_size for block is 1MiB. These are alloc sizes for bluefs not for user data. So bluefs data at main device (id=2) uses 16K allocation unit. But it's user data allocation size (=min_alloc_size) which mostly matters for main devices as bluefs uses this device in case of data spillover (i.e. lack of free space at DB volume) only . And please do not confuse allocation unit and device block size. The latter is almost always = 4K and determines minimal block size read/written from the disk. While allocation unit (=min_alloc_size) determines the allocated/tracked block size, i.e. minimal addresable block which BlueStore uses. > > Any other way to get min_alloc_size ? > > > > Cheers > > Francois > > -- > > > EveryWare AG > François Scheurer > Senior Systems Engineer > Zurlindenstrasse 52a > CH-8003 Zürich > > tel: +41 44 466 60 00 > fax: +41 44 466 60 10 > mail: francois.scheurer@xxxxxxxxxxxx > web: http://www.everyware.ch > > > ________________________________ > From: Dan van der Ster <dvanders@xxxxxxxxx> > Sent: Thursday, February 10, 2022 4:33 PM > To: Scheurer François > Cc: Ceph Users > Subject: Re: osd true blocksize vs bluestore_min_alloc_size > > Hi, > > When an osd starts it should log at level 1 the min_alloc_size, see > https://smex-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fgithub.com%2fceph%2fceph%2fblob%2fmaster%2fsrc%2fos%2fbluestore%2fBlueStore.cc%23L12260&umid=560175da-1141-4562-8189-c7cdbbc482a4&auth=25aea1164aa8751f7aac2232af31e2d418ade22a-84288fd25bb96a66a45175f2ed8db0ebed887e80 > > grep "min_alloc_size 0x" ceph-osd.*.log > > Cheers, Dan > > > On Thu, Feb 10, 2022 at 3:50 PM Scheurer François > <francois.scheurer@xxxxxxxxxxxx> wrote: >> Hi everyone >> >> >> How can we display the true osd block size? >> >> >> I get 64K for a hdd osd: >> >> ceph daemon osd.0 config show | egrep --color=always "alloc_size|bdev_block_size" >> "bdev_block_size": "4096", >> "bluefs_alloc_size": "1048576", >> "bluefs_shared_alloc_size": "65536", >> "bluestore_extent_map_inline_shard_prealloc_size": "256", >> "bluestore_max_alloc_size": "0", >> "bluestore_min_alloc_size": "0", >> "bluestore_min_alloc_size_hdd": "65536", >> "bluestore_min_alloc_size_ssd": "16384", >> >> But it was explained that bluestore_min_alloc_size_hdd is only affecting newly created osd's. >> So to check the current block size I can check the osd metadata and find 4K: >> ceph osd metadata osd.0 | jq '.bluestore_bdev_block_size' >> >> "bluestore_bdev_block_size": "4096", >> >> Checking an object block size directly also shows 4K: >> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 6.5s4 "cb1594b3-a782-49d0-a19f-68cd48870a63.95398870.14_DriveE/Predator/Doc/2021/03/01101038/1523111.pdf.zip" dump | jq '.stat' >> { >> "size": 32768, >> "blksize": 4096, >> "blocks": 8, >> "nlink": 1 >> } >> >> So these hdd osd's were created with 4K block size without honoring bluestore_min_alloc_size_hdd? >> The osd's are running on nautilus 14.2.5 and were created on luminous. >> >> Newer nvme osd's created on nautilus were also created with 4K without honoring bluestore_min_alloc_size_ssd (16K). >> >> This is confusing... Actually I would be happy with 4K as it is recommended to avoid over-allocation issue with EC pools. >> But I would like to understand how to show the true block size of an existing osd... >> >> Many thanks for your help! ;-) >> >> >> Cheers >> Francois Scheurer >> >> >> >> >> >> >> -- >> >> >> EveryWare AG >> François Scheurer >> Senior Systems Engineer >> Zurlindenstrasse 52a >> CH-8003 Zürich >> >> tel: +41 44 466 60 00 >> fax: +41 44 466 60 10 >> mail: francois.scheurer@xxxxxxxxxxxx >> web: http://www.everyware.ch >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Igor Fedotov Ceph Lead Developer Looking for help with your Ceph cluster? Contact us at https://smex-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fcroit.io&umid=053c6ecb-f791-49ec-b70b-4190cfea616a&auth=48c5c946edf114a93f29850dcf198d124dbaf7a3-0caeb9f1ecbca90258eff4703f0e86725db07528 croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://smex-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fcroit.io&umid=053c6ecb-f791-49ec-b70b-4190cfea616a&auth=48c5c946edf114a93f29850dcf198d124dbaf7a3-0caeb9f1ecbca90258eff4703f0e86725db07528 | YouTube: https://smex-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fgoo.gl%2fPGE1Bx&umid=053c6ecb-f791-49ec-b70b-4190cfea616a&auth=48c5c946edf114a93f29850dcf198d124dbaf7a3-454d47410bd17f141088ab42eff65639ade1f59a
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx