Hi Fransois,
you should set debug_bluestore = 10 instead.
And then grep for bluestore or min_alloc_size not bluefs, here is how
this is printed:
dout(10) << __func__ << " min_alloc_size 0x" << std::hex << min_alloc_size
<< std::dec << " order " << (int)min_alloc_size_order
<< " max_alloc_size 0x" << std::hex << max_alloc_size
<< " prefer_deferred_size 0x" << prefer_deferred_size
<< std::dec
<< " deferred_batch_ops " << deferred_batch_ops
<< dendl;
On 2/10/2022 7:39 PM, Scheurer François wrote:
Dear Dan
Thank you for your help.
After putting debug_osd = 10/5 in ceph.conf under [osd], I still do not get min_alloc_size logged.
Probably no logging it on 14.2.5.
But this come up:
ewceph1-osd001-prod:~ # egrep -a --color=always bluefs /var/log/ceph/ceph-osd.0.log | tail -111
2022-02-10 17:26:59.512 7f6026737d00 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 80 GiB
2022-02-10 17:26:59.512 7f6026737d00 1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-0/block size 7.3 TiB
2022-02-10 17:26:59.512 7f6026737d00 1 bluefs add_block_device bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 2 GiB
2022-02-10 17:27:00.896 7f6026737d00 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 80 GiB
2022-02-10 17:27:00.900 7f6026737d00 1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-0/block size 7.3 TiB
2022-02-10 17:27:00.900 7f6026737d00 1 bluefs add_block_device bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 2 GiB
2022-02-10 17:27:00.900 7f6026737d00 1 bluefs mount
2022-02-10 17:27:00.900 7f6026737d00 1 bluefs _init_alloc id 0 alloc_size 0x100000 size 0x80000000
2022-02-10 17:27:00.900 7f6026737d00 1 bluefs _init_alloc id 1 alloc_size 0x100000 size 0x1400000000
2022-02-10 17:27:00.900 7f6026737d00 1 bluefs _init_alloc id 2 alloc_size 0x10000 size 0x746fc051000
2022-02-10 17:27:04.516 7f6026737d00 1 bluefs umount
2022-02-10 17:27:05.200 7f6026737d00 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 80 GiB
2022-02-10 17:27:05.200 7f6026737d00 1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-0/block size 7.3 TiB
2022-02-10 17:27:05.200 7f6026737d00 1 bluefs add_block_device bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 2 GiB
2022-02-10 17:27:05.200 7f6026737d00 1 bluefs mount
2022-02-10 17:27:05.200 7f6026737d00 1 bluefs _init_alloc id 0 alloc_size 0x100000 size 0x80000000
2022-02-10 17:27:05.200 7f6026737d00 1 bluefs _init_alloc id 1 alloc_size 0x100000 size 0x1400000000
2022-02-10 17:27:05.200 7f6026737d00 1 bluefs _init_alloc id 2 alloc_size 0x10000 size 0x746fc051000
So alloc_size for block is 1MiB.
These are alloc sizes for bluefs not for user data. So bluefs data at
main device (id=2) uses 16K allocation unit. But it's user data
allocation size (=min_alloc_size) which mostly matters for main devices
as bluefs uses this device in case of data spillover (i.e. lack of free
space at DB volume) only .
And please do not confuse allocation unit and device block size. The
latter is almost always = 4K and determines minimal block size
read/written from the disk. While allocation unit (=min_alloc_size)
determines the allocated/tracked block size, i.e. minimal addresable
block which BlueStore uses.
Any other way to get min_alloc_size ?
Cheers
Francois
--
EveryWare AG
François Scheurer
Senior Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich
tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: francois.scheurer@xxxxxxxxxxxx
web: http://www.everyware.ch
________________________________
From: Dan van der Ster <dvanders@xxxxxxxxx>
Sent: Thursday, February 10, 2022 4:33 PM
To: Scheurer François
Cc: Ceph Users
Subject: Re: osd true blocksize vs bluestore_min_alloc_size
Hi,
When an osd starts it should log at level 1 the min_alloc_size, see
https://smex-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fgithub.com%2fceph%2fceph%2fblob%2fmaster%2fsrc%2fos%2fbluestore%2fBlueStore.cc%23L12260&umid=560175da-1141-4562-8189-c7cdbbc482a4&auth=25aea1164aa8751f7aac2232af31e2d418ade22a-84288fd25bb96a66a45175f2ed8db0ebed887e80
grep "min_alloc_size 0x" ceph-osd.*.log
Cheers, Dan
On Thu, Feb 10, 2022 at 3:50 PM Scheurer François
<francois.scheurer@xxxxxxxxxxxx> wrote:
Hi everyone
How can we display the true osd block size?
I get 64K for a hdd osd:
ceph daemon osd.0 config show | egrep --color=always "alloc_size|bdev_block_size"
"bdev_block_size": "4096",
"bluefs_alloc_size": "1048576",
"bluefs_shared_alloc_size": "65536",
"bluestore_extent_map_inline_shard_prealloc_size": "256",
"bluestore_max_alloc_size": "0",
"bluestore_min_alloc_size": "0",
"bluestore_min_alloc_size_hdd": "65536",
"bluestore_min_alloc_size_ssd": "16384",
But it was explained that bluestore_min_alloc_size_hdd is only affecting newly created osd's.
So to check the current block size I can check the osd metadata and find 4K:
ceph osd metadata osd.0 | jq '.bluestore_bdev_block_size'
"bluestore_bdev_block_size": "4096",
Checking an object block size directly also shows 4K:
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 6.5s4 "cb1594b3-a782-49d0-a19f-68cd48870a63.95398870.14_DriveE/Predator/Doc/2021/03/01101038/1523111.pdf.zip" dump | jq '.stat'
{
"size": 32768,
"blksize": 4096,
"blocks": 8,
"nlink": 1
}
So these hdd osd's were created with 4K block size without honoring bluestore_min_alloc_size_hdd?
The osd's are running on nautilus 14.2.5 and were created on luminous.
Newer nvme osd's created on nautilus were also created with 4K without honoring bluestore_min_alloc_size_ssd (16K).
This is confusing... Actually I would be happy with 4K as it is recommended to avoid over-allocation issue with EC pools.
But I would like to understand how to show the true block size of an existing osd...
Many thanks for your help! ;-)
Cheers
Francois Scheurer
--
EveryWare AG
François Scheurer
Senior Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich
tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: francois.scheurer@xxxxxxxxxxxx
web: http://www.everyware.ch
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Igor Fedotov
Ceph Lead Developer
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx