Re: WAL/DB size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I saw above the recommended size for the db partition was 5% of data, but yet the recommendation is 40GB partitions for 4TB drives. Isn't that closer to 1%?

On Fri, Sep 7, 2018 at 10:06 AM, Muhammad Junaid <junaid.fsd.pk@xxxxxxxxx> wrote:
Thanks very much. It is clear very much now. Because we are just in planning stage right now, would you tell me if we use 7200rpm SAS 3-4TB for OSD's, write speed will be fine with this new scenario? Because it will apparently writing to slower disks before actual confirmation. (I understand there must be advantages of bluestore using direct partitions). Regards.

Muhammad Junaid  

On Fri, Sep 7, 2018 at 6:39 PM Richard Hesketh <richard.hesketh@xxxxxxxxxxxx> wrote:
It can get confusing.

There will always be a WAL, and there will always be a metadata DB, for
a bluestore OSD. However, if a separate device is not specified for the
WAL, it is kept in the same device/partition as the DB; in the same way,
if a separate device is not specified for the DB, it is kept on the same
device as the actual data (an "all-in-one" OSD). Unless you have a
separate, even faster device for the WAL to go on, you shouldn't specify
it separately from the DB; just make one partition on your SSD per OSD,
and make them as large as will fit together on the SSD.

Also, just to be clear, the WAL is not exactly a journal in the same way
that Filestore required a journal. Because Bluestore can provide write
atomicity without requiring a separate journal, data is *usually*
written directly to the longterm storage; writes are only journalled in
the WAL to be flushed/synced later if they're below a certain size (IIRC
32kb by default), to avoid latency by excessive seeking on HDDs.

Rich

On 07/09/18 14:23, Muhammad Junaid wrote:
> Thanks again, but sorry again too. I couldn't understand the following.
>
> 1. As per docs, blocks.db is used only for bluestore (file system meta
> data info etc). It has nothing to do with actual data (for journaling)
> which will ultimately written to slower disks. 
> 2. How will actual journaling will work if there is no WAL (As you
> suggested)?
>
> Regards.
>
> On Fri, Sep 7, 2018 at 6:09 PM Alfredo Deza <adeza@xxxxxxxxxx
> <mailto:adeza@xxxxxxxxxx>> wrote:
>
>     On Fri, Sep 7, 2018 at 9:02 AM, Muhammad Junaid
>     <junaid.fsd.pk@xxxxxxxxx <mailto:junaid.fsd.pk@gmail.com>> wrote:
>     > Thanks Alfredo. Just to clear that My configuration has 5 OSD's
>     (7200 rpm
>     > SAS HDDS) which are slower than the 200G SSD. Thats why I asked
>     for a 10G
>     > WAL partition for each OSD on the SSD.
>     >
>     > Are you asking us to do 40GB  * 5 partitions on SSD just for block.db?
>
>     Yes.
>
>     You don't need a separate WAL defined. It only makes sense when you
>     have something *faster* than where block.db will live.
>
>     In your case 'data' will go in the slower spinning devices, 'block.db'
>     will go in the SSD, and there is no need for WAL. You would only
>     benefit
>     from WAL if you had another device, like an NVMe, where 2GB partitions
>     (or LVs) could be created for block.wal
>
>
>     >
>     > On Fri, Sep 7, 2018 at 5:36 PM Alfredo Deza <adeza@xxxxxxxxxx
>     <mailto:adeza@xxxxxxxxxx>> wrote:
>     >>
>     >> On Fri, Sep 7, 2018 at 8:27 AM, Muhammad Junaid
>     <junaid.fsd.pk@xxxxxxxxx <mailto:junaid.fsd.pk@gmail.com>>
>     >> wrote:
>     >> > Hi there
>     >> >
>     >> > Asking the questions as a newbie. May be asked a number of
>     times before
>     >> > by
>     >> > many but sorry, it is not clear yet to me.
>     >> >
>     >> > 1. The WAL device is just like journaling device used before
>     bluestore.
>     >> > And
>     >> > CEPH confirms Write to client after writing to it (Before
>     actual write
>     >> > to
>     >> > primary device)?
>     >> >
>     >> > 2. If we have lets say 5 OSD's (4 TB SAS) and 1 200GB SSD.
>     Should we
>     >> > partition SSD in 10 partitions? Shoud/Can we set WAL Partition Size
>     >> > against
>     >> > each OSD as 10GB? Or what min/max we should set for WAL
>     Partition? And
>     >> > can
>     >> > we set remaining 150GB as (30GB * 5) for 5 db partitions for
>     all OSD's?
>     >>
>     >> A WAL partition would only help if you have a device faster than the
>     >> SSD where the block.db would go.
>     >>
>     >> We recently updated our sizing recommendations for block.db at least
>     >> 4% of the size of block (also referenced as the data device):
>     >>
>     >>
>     >>
>     http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#sizing
>     >>
>     >> In your case, what you want is to create 5 logical volumes from your
>     >> 200GB at 40GB each, without a need for a WAL device.
>     >>
>     >>
>     >> >
>     >> > Thanks in advance. Regards.
>     >> >
>     >> > Muhammad Junaid
>     >> >
>     >> > _______________________________________________
>     >> > ceph-users mailing list
>     >> > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxx.com>
>     >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     >> >
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux