Re: Min alloc size according to media type

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 05/20/2016 05:08 AM, Sage Weil wrote:
On Fri, 20 May 2016, Ramesh Chander wrote:
Thanks Sage and Allen,

Unless we want to make bluestore smart enough to push object data on a
fast device (i.e., do ssd/hdd tiering internally), I'm not sure we
need per- device min_alloc_size.

If we make it per block device even it becomes more simpler to set it on device open time and read it whenever required.

Most of the places we are already reading block device from bdev->get_block_size(), this new one also goes with it.

I think the gc_conf->* are read only parameters , to circumvent it I need to set this info at BlueStore structure or globally somewhere.

Whatever you suggest is fine.

FWIW the github.com/liewebas/wip-bluestore-write branch already moves
block_size and min_alloc_size ot BlueStore class members so that it's not
always pulling them out of g_conf and bdev.

Don't worry about legacy at all since bluestore has no users.  :)

That simplifies it and I can simply remove it. Or do we still need to
keep the old parameter around and make that take precedence over two new
ones? I mean old option is applicable to both so we need to break tie
between new specific and old general parameter.

I think having bluestore_min_alloc_size, bluestore_min_alloc_size_hdd, and
bluestore_min_alloc_size_ssd still makes it easier to change for users.
It'll only go in one bit of code that updates the BlueStore min_alloc_size
member.

If we really want to go down this road, would it make sense to create storage class templates rather than global configuration parameters? Presumably you might want different compression, read ahead, or writeback caching depending on the device class as well.

Mark


Perhaps you can base this PR on the wip-bluestore-write branch.  It's
getting rebased still frequently but I think it's less than a
week away from being mergeable.

Currently it is transient everywhere, and so far I've been trying to
keep it that way.  However, we might want to change this: if we make
min_alloc_size fixed at mkfs time, we could possibly collapse down the
size of the allocation
bitmap(s) by a factor of 16 on HDD (1 bit per min_alloc_size instead
of per block).  I'm not sure that it's worth it, though... thoughts?
Collapsing the bitmap provides little DRAM savings and probably not much CPU time savings (though some additional (low risk) coding might be required to make this statement true), so I don't see much point in it.
Seems like extra complexity with little value.

I think as long as our min alloc size does reduce from previous value or
our bitmap vector has bit per minimum possible value of min_alloc_size,
we are good.  But as Allen said , we will not have significant saving
from this as fasr as cost (cpu and dram) is concerned.

Yeah, let's not worry about it then.

sage




-Ramesh Chander


-----Original Message-----
From: Allen Samuels
Sent: Friday, May 20, 2016 2:00 AM
To: Sage Weil; Ramesh Chander
Cc: ceph-devel@xxxxxxxxxxxxxxx
Subject: RE: Min alloc size according to media type

-----Original Message-----
From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-
owner@xxxxxxxxxxxxxxx] On Behalf Of Sage Weil
Sent: Thursday, May 19, 2016 12:20 PM
To: Ramesh Chander <Ramesh.Chander@xxxxxxxxxxx>
Cc: ceph-devel@xxxxxxxxxxxxxxx
Subject: Re: Min alloc size according to media type

On Thu, 19 May 2016, Ramesh Chander wrote:
Hi Sage,

I am doing changes in Bluestore related to minimum allocation size
according to ssd and hdd. This change involves:

1. There are three min alloc sizes now:
                a. min_alloc_size: old one, default changed to 0
                b. min_alloc_size_hdd: for rotational media, default 64k
                c. min_alloc_size_ssd: for ssd, default 4k.

2. Making changes in BlockDevice to maintain its own min_alloc_size.
It allows to maintain different min_alloc_size for different devices.

3. Making changes in allocator(stupid, bitmap) interfaces to take
min_alloc_size from the corresponding devices.

This makes sense if some devices are hdd and some are ssd (e.g., main
vs db/wal), but in practice the only separation currently possible is
to have a separate device for WAL and for rocksdb, both of which hare
managed by bluefs and not bluestore directly.  Ane bluefs currently
has a min_alloc_size of 1MB since all files are generally big (usually
4MB each), there are no random writes, etc.

Unless we want to make bluestore smart enough to push object data on a
fast device (i.e., do ssd/hdd tiering internally), I'm not sure we
need per- device min_alloc_size.

I think this is / will be valuable -- in the future. I don't see that this item significantly simplifies the future problem.



I have following questions regarding this parameter and use of it in
bluestore:

1. I assume this parameter is transient and does not have effect on
different values (say changed from 4k to 64k or vice versa) across
reboots or different ceph versions?
                Is it ondisk anywhere in metadata or in freelist manager
                in direct or indirect manner? Because having on disk
                presence could cause confusions by having new options
                when existing users move to build with this change.

Currently it is transient everywhere, and so far I've been trying to
keep it that way.  However, we might want to change this: if we make
min_alloc_size fixed at mkfs time, we could possibly collapse down the
size of the allocation
bitmap(s) by a factor of 16 on HDD (1 bit per min_alloc_size instead
of per block).  I'm not sure that it's worth it, though... thoughts?

Collapsing the bitmap provides little DRAM savings and probably not much CPU time savings (though some additional (low risk) coding might be required to make this statement true), so I don't see much point in it.
Seems like extra complexity with little value.


2. While figuring out the min_alloc_size for devices, I give
preferences to old config parameter so that existing configs
                are not changed by this code change. Is this right
or this is not required?

Don't worry about legacy at all since bluestore has no users.  :)

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel"
in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo
info at http://vger.kernel.org/majordomo-info.html
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux