On Fri, Sep 22, 2017 at 2:49 PM, Danny Al-Gaaf <danny.al-gaaf@xxxxxxxxx> wrote: > Am 22.09.2017 um 22:59 schrieb Gregory Farnum: > [..] >> This is super cool! Is there anything written down that explains this >> for Ceph developers who aren't familiar with the workings of Dovecot? >> I've got some questions I see going through it, but they may be very >> dumb. >> >> *) Why are indexes going on CephFS? Is this just about wanting a local >> cache, or about the existing Dovecot implementations, or something >> else? Almost seems like you could just store the whole thing in a >> CephFS filesystem if that's safe. ;) > > This is, if everything works as expected, only an intermediate step. An > idea is > (https://dalgaaf.github.io/CephMeetUpBerlin20170918-librmb/#/status-3) > be to use omap to store the index/meta data. > > We chose a step-by-step approach and since we are currently not sure if > using omap would work performance wise, we use CephFS (also since this > requires no changes in Dovecot). Currently we put our focus on the > development of the first version of librmb, but the code to use omap is > already there. It needs integration, testing, and performance tuning to > verify if it would work with our requirements. > >> *) It looks like each email is getting its own object in RADOS, and I >> assume those are small messages, which leads me to > > The mail distribution looks like this: > https://dalgaaf.github.io/CephMeetUpBerlin20170918-librmb/#/mailplatform-mails-dist > > > Yes, the majority of the mails are under 500k, but the most objects are > around 50k. Not so many very small objects. Ah, that slide makes more sense with that context — I was paging through it in bed last night and thought it was about the number of emails per user or something weird. So those mail objects are definitely bigger than I expected; interesting. > >> *) is it really cost-acceptable to not use EC pools on email data? > > We will use EC pools for the mail objects and replication for CephFS. > > But even without EC there would be a cost case compared to the current > system. We will save a large amount of IOPs in the new platform since > the (NFS) POSIX layer is removed from the IO path (at least for the mail > objects). And we expect with Ceph and commodity hardware we can compete > with a traditional enterprise NAS/NFS anyway. > >> *) isn't per-object metadata overhead a big cost compared to the >> actual stored data? > > I assume not. The metadata/index is not so much compared to the size of > the mails (currently with NFS around 10% I would say). In the classic > NFS based dovecot the number of index/cache/metadata files is an issue > anyway. With 6.7 billion mails we have 1.2 billion index/cache/metadata > files > (https://dalgaaf.github.io/CephMeetUpBerlin20170918-librmb/#/mailplatform-mails-nums). I was unclear; I meant the RADOS metadata cost of storing an object. I haven't quantified that in a while but it was big enough to make 4KB objects pretty expensive, which I was incorrectly assuming would be the case for most emails. EC pools have the same issue; if you want to erasure-code a 40KB object into 5+3 then you pay the metadata overhead for each 8KB (40KB/5) of data, but again that's more on the practical side of things than my initial assumptions placed it. This is super cool! -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com