On Fri, Nov 28, 2014 at 3:14 PM, Wido den Hollander <wido@xxxxxxxx> wrote: > On 11/28/2014 01:04 PM, Florian Haas wrote: >> Hi everyone, >> >> I'd like to come back to a discussion from 2012 (thread at >> http://marc.info/?l=ceph-devel&m=134808745719233) to estimate the >> expected MDS memory consumption from file metadata caching. I am certain >> the following is full of untested assumptions, some of which are >> probably inaccurate, so please shoot those down as needed. >> >> I did an entirely unscientific study of a real data set (my laptop, in >> case you care to know) which currently holds about 70G worth of data in >> a huge variety of file sizes and several file systems, and currently >> lists about 944,000 inodes as being in use. So going purely by order of >> magnitude and doing a wild approximation, I'll assume a ratio of 1 >> million files in 100G, or 10,000 files per gigabyte, which means an >> average file size of about 100KB -- again, approximating and forgetting >> about the difference between 10^3 and 2^10, and using a stupid >> arithmetic mean rather than a median which would probably be much more >> useful. >> >> If I were to assume that all those files were in CephFS, and they were >> all somehow regularly in use (or at least one file in each directory), >> then the Ceph MDS would have to keep the metadata of all those files in >> cache. Suppose further that the stat struct for all those files is >> anywhere between 1 and 2KB, and we go by an average of 1.5KB metadata >> per file including some overhead, then that would mean the average >> metadata per file is about 1.5% of the average file size. So for my 100G >> of data, the MDS would use about 1.5G of RAM for caching. >> >> If you scale that up for a filestore of say a petabyte, that means all >> your Ceph MDSs would consume a relatively whopping 15TB in total RAM for >> metadata caching, again assuming that *all* the data is actually used by >> clients. >> > > Why do you assume that ALL MDSs keep ALL metadata in memory? Isn't the > whole point of directory fragmentation that they all keep a bit of the > inodes in memory to spread the load? Directory subtree partitioning is considered neither stable nor supported. Hence why it's important to understand what a single active MDS will hold. >> Now of course it's entirely unrealistic that in a production system data >> is actually ever used across the board, but are the above considerations >> "close enough" for a rule-of-thumb approximation of MDS memory >> footprint? As in, >> >> Total MDS RAM = (Total used storage) * (fraction of data in regular use) >> * 0.015 >> >> If CephFS users could use a rule of thumb like that, it would help them >> answer questions like "given a filesystem of size X, will a single MDS >> be enough to hold my metadata caches if Y is the maximum amount of >> memory I can afford for budget Z". >> >> All thoughts and comments much appreciated. Thank you! >> >> Cheers, >> Florian _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com