Re: CephFS billions of files and inline_data?

Michael Metz-Martini | SpeedPartner GmbH <metz@xxxxxxxxxxxxxxx> · Wed, 16 Aug 2017 20:51:35 +0200

Hi,

Am 16.08.2017 um 19:31 schrieb Henrik Korkuc:
> On 17-08-16 19:40, John Spray wrote:
>> On Wed, Aug 16, 2017 at 3:27 PM, Henrik Korkuc <lists@xxxxxxxxx> wrote:
> maybe you can suggest any recommendations how to scale Ceph for billions
> of objects? More PGs per OSD, more OSDs, more pools? Somewhere in the
> list it was mentioned that OSDs need to keep object list in memory, is
> it still valid for bluestore?
We started using cephfs in 2014 and scaled to 4 billion small files in a
separate pool plus 500 million in a second pool - 2only" 225 TB of data.

Unfortunately every objects creates another object in the data pool so
(due to size with a replication of 2, which is a real pain in the a*)
we're now at about 16 billion inodes distributed over 136 spinning
disks. XFS performed very bad with such huge number of files so we
switched all osd's to ext4 one by one which helped a lot (but keep an
eye on your total number of inodes).

I'm quite sure we made many configuration mistakes (replication of 2 ;
to few pg's in the beginning) and had to learn a lot the hard way while
keeping the site up & running.

As our disks are filling up and we would have to expand our storage -
needs rebalance which takes several months(!) - we decided to leave the
ceph-train and migrate to a more filesystem-like setup. We don't really
need objectstores and it seems cephfs can't manage such a huge number of
files (or we're unable to optimize it for that use-case). We will give
glusterfs with Raid6 underneath and nfs a try - more "basic" and
hopefully more robust.

-- 
Kind regards
 Michael Metz-Martini
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com