RE: Min alloc size according to media type

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Sage Weil
> Sent: Thursday, May 19, 2016 12:20 PM
> To: Ramesh Chander <Ramesh.Chander@xxxxxxxxxxx>
> Cc: ceph-devel@xxxxxxxxxxxxxxx
> Subject: Re: Min alloc size according to media type
> 
> On Thu, 19 May 2016, Ramesh Chander wrote:
> > Hi Sage,
> >
> > I am doing changes in Bluestore related to minimum allocation size
> > according to ssd and hdd. This change involves:
> >
> > 1. There are three min alloc sizes now:
> >                 a. min_alloc_size: old one, default changed to 0
> >                 b. min_alloc_size_hdd: for rotational media, default 64k
> >                 c. min_alloc_size_ssd: for ssd, default 4k.
> >
> > 2. Making changes in BlockDevice to maintain its own min_alloc_size.
> > It allows to maintain different min_alloc_size for different devices.
> >
> > 3. Making changes in allocator(stupid, bitmap) interfaces to take
> > min_alloc_size from the corresponding devices.
> 
> This makes sense if some devices are hdd and some are ssd (e.g., main vs
> db/wal), but in practice the only separation currently possible is to have a
> separate device for WAL and for rocksdb, both of which hare managed by
> bluefs and not bluestore directly.  Ane bluefs currently has a min_alloc_size
> of 1MB since all files are generally big (usually 4MB each), there are no
> random writes, etc.
> 
> Unless we want to make bluestore smart enough to push object data on a
> fast device (i.e., do ssd/hdd tiering internally), I'm not sure we need per-
> device min_alloc_size.

I think this is / will be valuable -- in the future. I don't see that this item significantly simplifies the future problem.

> 
> > I have following questions regarding this parameter and use of it in
> > bluestore:
> >
> > 1. I assume this parameter is transient and does not have effect on
> > different values (say changed from 4k to 64k or vice versa) across
> > reboots or different ceph versions?
> >                 Is it ondisk anywhere in metadata or in freelist manager
> >                 in direct or indirect manner? Because having on disk
> >                 presence could cause confusions by having new options
> >                 when existing users move to build with this change.
> 
> Currently it is transient everywhere, and so far I've been trying to keep it that
> way.  However, we might want to change this: if we make min_alloc_size
> fixed at mkfs time, we could possibly collapse down the size of the allocation
> bitmap(s) by a factor of 16 on HDD (1 bit per min_alloc_size instead of per
> block).  I'm not sure that it's worth it, though... thoughts?

Collapsing the bitmap provides little DRAM savings and probably not much CPU time savings (though some additional (low risk) coding might be required to make this statement true), so I don't see much point in it.
Seems like extra complexity with little value.

> 
> > 2. While figuring out the min_alloc_size for devices, I give
> > preferences to old config parameter so that existing configs
> >                 are not changed by this code change. Is this right or
> > this is not required?
> 
> Don't worry about legacy at all since bluestore has no users.  :)
> 
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux