On Thu, Aug 13, 2015 at 5:12 AM, Bob Ababurko <bob@xxxxxxxxxxxx> wrote: >> > I am actually looking for the most stable way to implement cephfs at >> > this >> > point. My cephfs cluster contains millions of small files, so many >> > inodes >> > if that needs to be taken into account. Perhaps I should only be using >> > one >> > MDS node for stability at this point? Is this the best way forward to >> > get a >> > handle on stability? I'm also curious if I should I set my mds cache >> > size >> > to a number greater than files I have in the cephfs cluster? If you can >> > give some key points to configure cephfs to get the best stability and >> > if >> > possible, availability.....this would be helpful to me. >> >> One active MDS is the most stable setup. Adding a few standby MDS >> should not hurt stability. >> >> You can't set mds cache size to a number greater than files in the >> fs, it requires lots of memory. > > > > I'm not sure what amount of RAM you consider to be 'lots' but I would really > like to understand a bit more about this. Perhaps a rule of thumb? It > there an advantage to more RAM & large mds cache size? We plan on putting > close to a billion small files in this pool via cephfs so what should we be > considering when sizing our MDS hosts OR change to the MDS config? > Basically, what should we OR should not be doing when we have a cluster with > this many files? Thanks! The advantage to setting up a larger cache is: * We can allow clients to hold more in cache (anything in client cache must also be in MDS cache) * We are less likely to need to read from disk on a random metadata read * We are less likely to need to write from to disk again if a file was modified (can just journal + update in cache) None of these outcomes particularly relevant if your workload is a stream of a billion creates. The reason we're hitting the cache size limit in this case is because of the size of the directories: some operations during restart of the MDS are happening at a per-directory level of granularity. If you're running up to deploying a billion-file workload, it might be worth doing some experiments on a smaller system with the same file hierarchy structure. You could experiment with enabling inline data, tuning mds_bal_split_size (how large dirs grow before getting fragmented), mds_cache_size and see what effect these options have on the rate of file creates that we sustain. For best results, also periodically kill an MDS during a run, to check that the system recovers correctly (i.e. check for bugs like the one you've just hit). As for the most stable configuration, the "CephFS for early adopters" page[1] is still current. Enabling inline data and/or directory fragmentation will put you in slightly riskier territory (aka less comprehensively tested by us), but if you can check that the filesystem is working correctly for your workload in a POC then that's the most important measure of whether it's suitable for you to deploy. John 1. http://ceph.com/docs/master/cephfs/early-adopters/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com