Re: bluefs _allocate unable to allocate on bdev 2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Stefan and Igor,

We are testing to do release update from Ubuntu 20.04 to 22.04 with ceph update, so once the cluster is Quincy, is it safe to set this value in the config db to 4k without rebuilding 100s of osds?

As I understand Igor here, in his case https://www.spinics.net/lists/ceph-users/msg81389.html should rebuild all.
<https://www.spinics.net/lists/ceph-users/msg81389.html>

Istvan

________________________________
From: Stefan Kooman <stefan@xxxxxx>
Sent: Thursday, September 12, 2024 7:30:15 PM
To: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>; igor.fedotov@xxxxxxxx <igor.fedotov@xxxxxxxx>
Cc: Ceph Users <ceph-users@xxxxxxx>
Subject: Re:  Re: bluefs _allocate unable to allocate on bdev 2

Email received from the internet. If in doubt, don't click any link nor open any attachment !
________________________________

On 12-09-2024 11:40, Szabo, Istvan (Agoda) wrote:
> Thank you, so quincy should be ok right?

Yes.

The problem was a spillover why
> we went from separate rocksdb and wal back to not separated setup with
> 4osd/ssd.
> My osds are 53% full only, would it be possible to somehow increase the
> default 4% somehow to 8% on an existing osd?

If you still have space left on the device this should be possible
[https://docs.ceph.com/en/latest/man/8/ceph-bluestore-tool/]:

ceph-bluestore-tool bluefs-bdev-expand --path osd path

Gr. Stefan
>
> ------------------------------------------------------------------------
> *From:* Stefan Kooman <stefan@xxxxxx>
> *Sent:* Thursday, September 12, 2024 3:54 PM
> *To:* Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>;
> igor.fedotov@xxxxxxxx <igor.fedotov@xxxxxxxx>
> *Cc:* Ceph Users <ceph-users@xxxxxxx>
> *Subject:* Re:  Re: bluefs _allocate unable to allocate on
> bdev 2
> Email received from the internet. If in doubt, don't click any link nor
> open any attachment !
> ________________________________
>
> On 12-09-2024 06:43, Szabo, Istvan (Agoda) wrote:
>  > Maybe we are running into this bug Igor?
>  > https://github.com/ceph/ceph/pull/48854
> <https://github.com/ceph/ceph/pull/48854>
>
> That would be a solution for the bug you might be hitting (unable to
> allocate 64K aligned blocks for RocksDB).
>
> I would not be surprised if you hit this issue if you are using a
> workload that needs a lot of small IO (4K alloc) to be stored which
> fragments the OSD which, based on your earlier posts to this list, I
> believe you have. We found out (with help from Igor) that there is also
> a huge performance penalty involved when OSDs are heavily fragmented, as
> the RocksDB allocator will take a long time to find free blocks to use.
> Note that a disk can get (heavily) fragmented within days. Like you, we
> had single disk OSDs with no separate WAL/DB. We re-provisioned all OSDs
> to create a separate DB on the same disk (separate LVM volume). Make
> sure you make this volume big enough (check the current size of the
> RocksDB). You could create the OSD in the following way:  ceph-volume
> lvm create --bluestore --dmcrypt --data osd.$id/osd.$id --block.db
> /dev/osd.$id/osd.$id_db (omit --dmcrypt if you do not want encryption).
> The documentation states between 1-4% of disk size (when RocksDB is not
> using compression) [1].
>
> Your other option is upgrading to a release that has 4K support for
> BlueFS. But note that it is not a default setting in Ceph releases as of
> yet (AFAIK), and might be a bit more risky (yet unknown side effects /
> drawbacks).
>
> Gr. Stefan
>
> [1]:
> https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/#sizing <https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/#sizing>
>  > ________________________________
>  > From: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>
>  > Sent: Thursday, September 12, 2024 6:50 AM
>  > To: Ceph Users <ceph-users@xxxxxxx>
>  > Subject:  Re: bluefs _allocate unable to allocate on bdev 2
>  >
>  > This is the end of a manual compaction and can't start actually even
> after compaction:
>  > Meta:
> https://gist.github.com/Badb0yBadb0y/f918b1e4f2d5966cefaf96d879c52a6e
> <https://gist.github.com/Badb0yBadb0y/f918b1e4f2d5966cefaf96d879c52a6e>
>  > Log:
> https://gist.github.com/Badb0yBadb0y/054a0cefd4a56f0236b26479cc1a5290
> <https://gist.github.com/Badb0yBadb0y/054a0cefd4a56f0236b26479cc1a5290>
>  > ________________________________
>  > From: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>
>  > Sent: Thursday, September 12, 2024 6:34 AM
>  > To: Ceph Users <ceph-users@xxxxxxx>
>  > Subject:  bluefs _allocate unable to allocate on bdev 2
>  >
>  > Hi,
>  >
>  > Since yesterday on ceph octopus it started to crash multiple osds in
> the cluster and I can see this error in most of the logs:
>  >
>  > 2024-09-12T06:13:35.805+0700 7f98b8b27700  1 bluefs _allocate failed
> to allocate 0xf0732 on bdev 1, free 0x40000; fallback to bdev 2
>  > 2024-09-12T06:13:35.805+0700 7f98b8b27700  1 bluefs _allocate unable
> to allocate 0xf0732 on bdev 2, free 0xffffffffffffffff; fallback to slow
> device expander
>  >
>  > The osds are 57% full so I think space issue shouldn't happen.
>  >
>  > I'm using ssds so I don't have separated wal/rocksdb.
>  >
>  > Running some compaction now but I don't think it will happen on a
> long run.
>  > What could be this issue and how to fix?
>  >
>  > Ty
>  >
>  > ________________________________
>  > This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by
> copyright or other legal rules. If you have received it by mistake
> please let us know by reply email and delete it from your system. It is
> prohibited to copy this message or disclose its content to anyone. Any
> confidentiality or privilege is not waived or lost by any mistaken
> delivery or unauthorized disclosure of the message. All messages sent to
> and from Agoda may be monitored to ensure compliance with company
> policies, to protect the company's interests and to remove potential
> malware. Electronic messages may be intercepted, amended, lost or
> deleted, or contain viruses.
>  > _______________________________________________
>  > ceph-users mailing list -- ceph-users@xxxxxxx
>  > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>  > _______________________________________________
>  > ceph-users mailing list -- ceph-users@xxxxxxx
>  > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>  > _______________________________________________
>  > ceph-users mailing list -- ceph-users@xxxxxxx
>  > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>  >
>



________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux