Re: CephFS configuration for millions of small files

Paul Emmerich <paul.emmerich@xxxxxxxx> · Wed, 1 Aug 2018 14:18:17 +0200

Please keep the discussion on the mailing list.

With 11 nodes and requirements I'd probably go for 8+2 or 7+3 depending on the exact requirements.
The problem with +1 is that you either accept writes when you cannot guarantee redundancy or you have a downtime when one osd is down.
Yes, you can reduce the min alloc size (16kb on SSDs by default), but it's metadata overhead that will be the main problem here.

Paul

2018-07-31 21:58 GMT+02:00 Anton Aleksandrov <anton@xxxxxxxxxxxxxx>:

    Hello Paul,
    I am sorry for writing to you directly, but as I am being pushed
      to use of Ceph due to lack of space, I still have worries about
      it. Could you please elaborate on the points, you mentioned in
      your reply.
    Why 10+1 configuration is bad? Say we have 12 OSD nodes - 10+1 -
      then we have one "extra". Would it be better with same 12 nodes to
      have 9+1 or 8+1 or even 5+1 configuration? Reason for 10+1 is to
      safe space as much as possible. But may be we don't see other
      possible bottlenecks and underwater stones and problems.. 

    And regarding minimal allocation size. How bad idea is to lower
      it? After your message we are reviewing possibility to store small
      files in another way (they are thumbnails, so can be generated on
      the fly).. but in general - is it bad idea to lower it or not?
    I would be very grateful for your reply.

    Anton.

    On 30.07.2018 17:55, Paul Emmerich
      wrote:

        10+1 is a bad idea for obvious reasons (not enough coding
          chunks, you will be offline if even one server is offline).

        The real problem is that your 20kb files will be split up
          into 2kb chunks and the metadata overhead and bluestore min
          alloc size will eat up your disk space.

        Paul

        2018-07-30 13:44 GMT+02:00 Anton
          Aleksandrov <anton@xxxxxxxxxxxxxx>:

          Hello
            community,

            I am building first cluster for project, that hosts millions
            of small (from 20kb) and big (up to 10mb) files. Right now
            we are moving from local 16tb raid storage to cluster of 12
            small machines.  We are planning to have 11 OSD nodes, use
            erasure coding pool (10+1) and one host for MDS.

            On my local tests I see, that available space decrease
            unproportionally to the amount of data copied into cluster.
            With clean cluster I have, for example 100gb available
            space, but after copying 40gb in - size decreases for about
            5-10%. Is that normal?

            Is there any term, that would specify cluster's minimal
            object size?

            I also have question if having so many small files (current
            number is about 50'000'000 files at least) - could have
            negative impact and where would be our bottleneck? As we
            don't have money for SSD, we will have WAL/DB on separate
            simple HDD.

            Also - would that help to put Metadata pool on separate
            disks, away from Data pool drives for CephFS?

            Regards,

            Anton.

            _______________________________________________

            ceph-users mailing list

            ceph-users@xxxxxxxxxxxxxx

            http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

        -- 

                  Paul Emmerich

                    Looking for help with your Ceph cluster? Contact us
                    at https://croit.io

                    croit GmbH

                    Freseniusstr. 31h

                    81247 München

                    www.croit.io

                    Tel: +49 89 1896585 90

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com