Re: osd true blocksize vs bluestore_min_alloc_size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Fransois,

you should set debug_bluestore = 10 instead.

And then grep for bluestore or min_alloc_size not bluefs, here is how this is printed:

 dout(10) << __func__ << " min_alloc_size 0x" << std::hex << min_alloc_size
           << std::dec << " order " << (int)min_alloc_size_order
           << " max_alloc_size 0x" << std::hex << max_alloc_size
           << " prefer_deferred_size 0x" << prefer_deferred_size
           << std::dec
           << " deferred_batch_ops " << deferred_batch_ops
           << dendl;

On 2/10/2022 7:39 PM, Scheurer François wrote:
Dear Dan


Thank you for your help.

After putting debug_osd = 10/5 in ceph.conf under [osd], I still do not get min_alloc_size logged.

Probably no logging it on 14.2.5.

But this come up:

ewceph1-osd001-prod:~ # egrep -a --color=always bluefs /var/log/ceph/ceph-osd.0.log | tail -111
2022-02-10 17:26:59.512 7f6026737d00  1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 80 GiB

2022-02-10 17:26:59.512 7f6026737d00  1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-0/block size 7.3 TiB
2022-02-10 17:26:59.512 7f6026737d00  1 bluefs add_block_device bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 2 GiB
2022-02-10 17:27:00.896 7f6026737d00  1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 80 GiB
2022-02-10 17:27:00.900 7f6026737d00  1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-0/block size 7.3 TiB
2022-02-10 17:27:00.900 7f6026737d00  1 bluefs add_block_device bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 2 GiB
2022-02-10 17:27:00.900 7f6026737d00  1 bluefs mount
2022-02-10 17:27:00.900 7f6026737d00  1 bluefs _init_alloc id 0 alloc_size 0x100000 size 0x80000000
2022-02-10 17:27:00.900 7f6026737d00  1 bluefs _init_alloc id 1 alloc_size 0x100000 size 0x1400000000
2022-02-10 17:27:00.900 7f6026737d00  1 bluefs _init_alloc id 2 alloc_size 0x10000 size 0x746fc051000
2022-02-10 17:27:04.516 7f6026737d00  1 bluefs umount
2022-02-10 17:27:05.200 7f6026737d00  1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 80 GiB
2022-02-10 17:27:05.200 7f6026737d00  1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-0/block size 7.3 TiB
2022-02-10 17:27:05.200 7f6026737d00  1 bluefs add_block_device bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 2 GiB
2022-02-10 17:27:05.200 7f6026737d00  1 bluefs mount
2022-02-10 17:27:05.200 7f6026737d00  1 bluefs _init_alloc id 0 alloc_size 0x100000 size 0x80000000
2022-02-10 17:27:05.200 7f6026737d00  1 bluefs _init_alloc id 1 alloc_size 0x100000 size 0x1400000000
2022-02-10 17:27:05.200 7f6026737d00  1 bluefs _init_alloc id 2 alloc_size 0x10000 size 0x746fc051000

So alloc_size for block is 1MiB.

These are alloc sizes for bluefs not for user data. So bluefs data at main device (id=2) uses 16K allocation unit. But it's user data allocation size (=min_alloc_size) which mostly matters for main devices as bluefs uses this device in case of data spillover (i.e. lack of free space at DB volume) only .


And please do not confuse allocation unit and device block size. The latter is almost always = 4K and determines minimal block size read/written from the disk. While allocation unit (=min_alloc_size) determines the allocated/tracked block size, i.e. minimal addresable block which BlueStore uses.


Any other way to get min_alloc_size ?



Cheers

Francois

--


EveryWare AG
François Scheurer
Senior Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich

tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: francois.scheurer@xxxxxxxxxxxx
web: http://www.everyware.ch


________________________________
From: Dan van der Ster <dvanders@xxxxxxxxx>
Sent: Thursday, February 10, 2022 4:33 PM
To: Scheurer François
Cc: Ceph Users
Subject: Re:  osd true blocksize vs bluestore_min_alloc_size

Hi,

When an osd starts it should log at level 1 the min_alloc_size, see
https://smex-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fgithub.com%2fceph%2fceph%2fblob%2fmaster%2fsrc%2fos%2fbluestore%2fBlueStore.cc%23L12260&umid=560175da-1141-4562-8189-c7cdbbc482a4&auth=25aea1164aa8751f7aac2232af31e2d418ade22a-84288fd25bb96a66a45175f2ed8db0ebed887e80

grep "min_alloc_size 0x" ceph-osd.*.log

Cheers, Dan


On Thu, Feb 10, 2022 at 3:50 PM Scheurer François
<francois.scheurer@xxxxxxxxxxxx> wrote:
Hi everyone


How can we display the true osd block size?


I get 64K for a hdd osd:

         ceph daemon osd.0 config show | egrep --color=always "alloc_size|bdev_block_size"
             "bdev_block_size": "4096",
             "bluefs_alloc_size": "1048576",
             "bluefs_shared_alloc_size": "65536",
             "bluestore_extent_map_inline_shard_prealloc_size": "256",
             "bluestore_max_alloc_size": "0",
             "bluestore_min_alloc_size": "0",
             "bluestore_min_alloc_size_hdd": "65536",
             "bluestore_min_alloc_size_ssd": "16384",

But it was explained that bluestore_min_alloc_size_hdd is only affecting newly created osd's.
So to check the current block size I can check the osd metadata and find 4K:
         ceph osd metadata osd.0 | jq '.bluestore_bdev_block_size'

             "bluestore_bdev_block_size": "4096",

Checking an object block size directly also shows 4K:
         ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 6.5s4 "cb1594b3-a782-49d0-a19f-68cd48870a63.95398870.14_DriveE/Predator/Doc/2021/03/01101038/1523111.pdf.zip" dump | jq '.stat'
             {
               "size": 32768,
               "blksize": 4096,
               "blocks": 8,
               "nlink": 1
             }

So these hdd osd's were created with 4K block size without honoring bluestore_min_alloc_size_hdd?
The osd's are running on nautilus 14.2.5 and were created on luminous.

Newer nvme osd's created on nautilus were also created with 4K without honoring bluestore_min_alloc_size_ssd (16K).

This is confusing... Actually I would be happy with 4K as it is recommended to avoid over-allocation issue with EC pools.
But I would like to understand how to show the true block size of an existing osd...

Many thanks for your help! ;-)


Cheers
Francois Scheurer






--


EveryWare AG
François Scheurer
Senior Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich

tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: francois.scheurer@xxxxxxxxxxxx
web: http://www.everyware.ch
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux