Re: mds server(s) crashed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 13, 2015 at 5:12 AM, Bob Ababurko <bob@xxxxxxxxxxxx> wrote:
>> > I am actually looking for the most stable way to implement cephfs at
>> > this
>> > point.   My cephfs cluster contains millions of small files, so many
>> > inodes
>> > if that needs to be taken into account.  Perhaps I should only be using
>> > one
>> > MDS node for stability at this point?  Is this the best way forward to
>> > get a
>> > handle on stability?  I'm also curious if I should I set my mds cache
>> > size
>> > to a number greater than files I have in the cephfs cluster?  If you can
>> > give some key points to configure cephfs to get the best stability and
>> > if
>> > possible, availability.....this would be helpful to me.
>>
>> One active MDS is the most stable setup. Adding a few standby MDS
>> should not hurt stability.
>>
>> You can't set  mds cache size to a number greater than files in the
>> fs, it requires lots of memory.
>
>
>
> I'm not sure what amount of RAM you consider to be 'lots' but I would really
> like to understand a bit more about this.  Perhaps a rule of thumb?  It
> there an advantage to more RAM & large mds cache size?  We plan on putting
> close to a billion small files in this pool via cephfs so what should we be
> considering when sizing our MDS hosts OR change to the MDS config?
> Basically, what should we OR should not be doing when we have a cluster with
> this many files?  Thanks!

The advantage to setting up a larger cache is:
 * We can allow clients to hold more in cache (anything in client
cache must also be in MDS cache)
 * We are less likely to need to read from disk on a random metadata read
 * We are less likely to need to write from to disk again if a file
was modified (can just journal + update in cache)

None of these outcomes particularly relevant if your workload is a
stream of a billion creates.  The reason we're hitting the cache size
limit in this case is because of the size of the directories: some
operations during restart of the MDS are happening at a per-directory
level of granularity.

If you're running up to deploying a billion-file workload, it might be
worth doing some experiments on a smaller system with the same file
hierarchy structure.  You could experiment with enabling inline data,
tuning mds_bal_split_size (how large dirs grow before getting
fragmented), mds_cache_size and see what effect these options have on
the rate of file creates that we sustain.  For best results, also
periodically kill an MDS during a run, to check that the system
recovers correctly (i.e. check for bugs like the one you've just hit).

As for the most stable configuration, the "CephFS for early adopters"
page[1] is still current.  Enabling inline data and/or directory
fragmentation will put you in slightly riskier territory (aka less
comprehensively tested by us), but if you can check that the
filesystem is working correctly for your workload in a POC then that's
the most important measure of whether it's suitable for you to deploy.

John

1. http://ceph.com/docs/master/cephfs/early-adopters/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux