RE: Min alloc size according to media type

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On 05/20/2016 05:08 AM, Sage Weil wrote:
> > On Fri, 20 May 2016, Ramesh Chander wrote:
> >> Thanks Sage and Allen,
> >>
> >>>> Unless we want to make bluestore smart enough to push object data
> >>>> on a fast device (i.e., do ssd/hdd tiering internally), I'm not
> >>>> sure we need per- device min_alloc_size.
> >>
> >> If we make it per block device even it becomes more simpler to set it on
> device open time and read it whenever required.
> >>
> >> Most of the places we are already reading block device from bdev-
> >get_block_size(), this new one also goes with it.
> >>
> >> I think the gc_conf->* are read only parameters , to circumvent it I need
> to set this info at BlueStore structure or globally somewhere.
> >>
> >> Whatever you suggest is fine.
> >
> > FWIW the github.com/liewebas/wip-bluestore-write branch already moves
> > block_size and min_alloc_size ot BlueStore class members so that it's
> > not always pulling them out of g_conf and bdev.
> >
> >>>> Don't worry about legacy at all since bluestore has no users.  :)
> >>
> >> That simplifies it and I can simply remove it. Or do we still need to
> >> keep the old parameter around and make that take precedence over two
> >> new ones? I mean old option is applicable to both so we need to break
> >> tie between new specific and old general parameter.
> >
> > I think having bluestore_min_alloc_size, bluestore_min_alloc_size_hdd,
> > and bluestore_min_alloc_size_ssd still makes it easier to change for users.
> > It'll only go in one bit of code that updates the BlueStore
> > min_alloc_size member.
> 
> If we really want to go down this road, would it make sense to create storage
> class templates rather than global configuration parameters?
> Presumably you might want different compression, read ahead, or writeback
> caching depending on the device class as well.
> 

I believe that administratively, you want to do this on a per-pool basis rather than on a device class basis.

> Mark
> 
> >
> > Perhaps you can base this PR on the wip-bluestore-write branch.  It's
> > getting rebased still frequently but I think it's less than a week
> > away from being mergeable.
> >
> >>>> Currently it is transient everywhere, and so far I've been trying
> >>>> to keep it that way.  However, we might want to change this: if we
> >>>> make min_alloc_size fixed at mkfs time, we could possibly collapse
> >>>> down the size of the allocation
> >>>> bitmap(s) by a factor of 16 on HDD (1 bit per min_alloc_size
> >>>> instead of per block).  I'm not sure that it's worth it, though... thoughts?
> >>> Collapsing the bitmap provides little DRAM savings and probably not
> much CPU time savings (though some additional (low risk) coding might be
> required to make this statement true), so I don't see much point in it.
> >> Seems like extra complexity with little value.
> >>
> >> I think as long as our min alloc size does reduce from previous value
> >> or our bitmap vector has bit per minimum possible value of
> >> min_alloc_size, we are good.  But as Allen said , we will not have
> >> significant saving from this as fasr as cost (cpu and dram) is concerned.
> >
> > Yeah, let's not worry about it then.
> >
> > sage
> >
> >
> >
> >>
> >> -Ramesh Chander
> >>
> >>
> >> -----Original Message-----
> >> From: Allen Samuels
> >> Sent: Friday, May 20, 2016 2:00 AM
> >> To: Sage Weil; Ramesh Chander
> >> Cc: ceph-devel@xxxxxxxxxxxxxxx
> >> Subject: RE: Min alloc size according to media type
> >>
> >>> -----Original Message-----
> >>> From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-
> >>> owner@xxxxxxxxxxxxxxx] On Behalf Of Sage Weil
> >>> Sent: Thursday, May 19, 2016 12:20 PM
> >>> To: Ramesh Chander <Ramesh.Chander@xxxxxxxxxxx>
> >>> Cc: ceph-devel@xxxxxxxxxxxxxxx
> >>> Subject: Re: Min alloc size according to media type
> >>>
> >>> On Thu, 19 May 2016, Ramesh Chander wrote:
> >>>> Hi Sage,
> >>>>
> >>>> I am doing changes in Bluestore related to minimum allocation size
> >>>> according to ssd and hdd. This change involves:
> >>>>
> >>>> 1. There are three min alloc sizes now:
> >>>>                 a. min_alloc_size: old one, default changed to 0
> >>>>                 b. min_alloc_size_hdd: for rotational media, default 64k
> >>>>                 c. min_alloc_size_ssd: for ssd, default 4k.
> >>>>
> >>>> 2. Making changes in BlockDevice to maintain its own min_alloc_size.
> >>>> It allows to maintain different min_alloc_size for different devices.
> >>>>
> >>>> 3. Making changes in allocator(stupid, bitmap) interfaces to take
> >>>> min_alloc_size from the corresponding devices.
> >>>
> >>> This makes sense if some devices are hdd and some are ssd (e.g.,
> >>> main vs db/wal), but in practice the only separation currently
> >>> possible is to have a separate device for WAL and for rocksdb, both
> >>> of which hare managed by bluefs and not bluestore directly.  Ane
> >>> bluefs currently has a min_alloc_size of 1MB since all files are
> >>> generally big (usually 4MB each), there are no random writes, etc.
> >>>
> >>> Unless we want to make bluestore smart enough to push object data on
> >>> a fast device (i.e., do ssd/hdd tiering internally), I'm not sure we
> >>> need per- device min_alloc_size.
> >>
> >> I think this is / will be valuable -- in the future. I don't see that this item
> significantly simplifies the future problem.
> >>
> >>
> >>>
> >>>> I have following questions regarding this parameter and use of it
> >>>> in
> >>>> bluestore:
> >>>>
> >>>> 1. I assume this parameter is transient and does not have effect on
> >>>> different values (say changed from 4k to 64k or vice versa) across
> >>>> reboots or different ceph versions?
> >>>>                 Is it ondisk anywhere in metadata or in freelist manager
> >>>>                 in direct or indirect manner? Because having on disk
> >>>>                 presence could cause confusions by having new options
> >>>>                 when existing users move to build with this change.
> >>>
> >>> Currently it is transient everywhere, and so far I've been trying to
> >>> keep it that way.  However, we might want to change this: if we make
> >>> min_alloc_size fixed at mkfs time, we could possibly collapse down
> >>> the size of the allocation
> >>> bitmap(s) by a factor of 16 on HDD (1 bit per min_alloc_size instead
> >>> of per block).  I'm not sure that it's worth it, though... thoughts?
> >>
> >> Collapsing the bitmap provides little DRAM savings and probably not much
> CPU time savings (though some additional (low risk) coding might be
> required to make this statement true), so I don't see much point in it.
> >> Seems like extra complexity with little value.
> >>
> >>>
> >>>> 2. While figuring out the min_alloc_size for devices, I give
> >>>> preferences to old config parameter so that existing configs
> >>>>                 are not changed by this code change. Is this right
> >>>> or this is not required?
> >>>
> >>> Don't worry about legacy at all since bluestore has no users.  :)
> >>>
> >>> sage
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >>> in the body of a message to majordomo@xxxxxxxxxxxxxxx More
> majordomo
> >>> info at http://vger.kernel.org/majordomo-info.html
> >> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If the
> reader of this message is not the intended recipient, you are hereby notified
> that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly prohibited. If
> you have received this communication in error, please notify the sender by
> telephone or e-mail (as shown above) immediately and destroy any and all
> copies of this message in your possession (whether hard copies or
> electronically stored copies).
> >>
> >>
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majordomo@xxxxxxxxxxxxxxx More
> majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> >
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux