Hi Bron,thanks a lot for this detailed description of your setup! I have added a few questions inline below ...
--On 23. September 2014 21:32:04 +1000 Bron Gondwana <brong@xxxxxxxxxxx> wrote:
On Tue, Sep 23, 2014, at 06:58 PM, Sebastian Hagedorn wrote:Hi, as I mentioned a few days ago, we're considering metapartions on SSD drives in order to optimize IMAP search performance. We have yet to run a full analysis on how much storage that would require, but a first guesstimate points towards about 20% of the net mail data for all the cyrus.* files when using SQUAT. Bron mentioned support for xapian in 2.5, so I took a look at the branch and noticed that there isn't only support for xapian, but actually a choice of SQUAT, xapian and Sphinx. Eventually I'd like to learn the pros and cons of the various choices, but right now I have mainly one concern: Will index files be larger with xapian or Sphinx? Will they also be stored on the metapartions? My concern is that we might run out of space on those metapartitions if we choose a different indexer ... what's the operational experience regarding that at Fastmail?So Sphinx was just too IO intensive, we had to ditch it entirely, but we didn't kill the code. It's probably stale though - I wouldn't use it without doing a ton of testing.
OK.
10% is a reasonable estimate for search. We run a 3Tb search partition for 20T of email storage, and it's nowhere near full. Here's one with 20 slots, 18 of which are in use: /dev/mapper/md2 2.7T 988G 1.8T 36% /mnt/i32d2search /dev/mapper/sdb1 917G 573G 298G 66% /mnt/i32d2t01 /dev/mapper/sdb2 917G 576G 295G 67% /mnt/i32d2t02 /dev/mapper/sdb3 917G 571G 300G 66% /mnt/i32d2t03 /dev/mapper/sdb4 917G 573G 298G 66% /mnt/i32d2t04 /dev/mapper/sdb5 917G 702G 169G 81% /mnt/i32d2t05 /dev/mapper/sdb6 917G 743G 128G 86% /mnt/i32d2t06 /dev/mapper/sdb7 917G 697G 174G 81% /mnt/i32d2t07 /dev/mapper/sdb8 917G 760G 111G 88% /mnt/i32d2t08 /dev/mapper/sdb9 917G 763G 108G 88% /mnt/i32d2t09 /dev/mapper/sdb10 917G 727G 144G 84% /mnt/i32d2t10 /dev/mapper/sdb11 917G 754G 117G 87% /mnt/i32d2t11 /dev/mapper/sdb12 917G 757G 114G 87% /mnt/i32d2t12 /dev/mapper/sdb13 917G 706G 165G 82% /mnt/i32d2t13 /dev/mapper/sdb14 917G 746G 125G 86% /mnt/i32d2t14 /dev/mapper/sdb15 917G 72M 870G 1% /mnt/i32d2t15 /dev/mapper/sdb16 917G 72M 870G 1% /mnt/i32d2t16 /dev/mapper/sdb17 917G 704G 167G 81% /mnt/i32d2t17 /dev/mapper/sdb18 917G 774G 97G 89% /mnt/i32d2t18 /dev/mapper/sdb19 917G 722G 149G 83% /mnt/i32d2t19 /dev/mapper/sdb20 917G 741G 130G 86% /mnt/i32d2t20 /dev/md1 367G 249G 118G 68% /mnt/ssd32d2 sdb1-20 are LUKS encrypted partitions on a single hardware RAID6 volume with 12 x 2Tb WD RE4 drives. md2 is also LUKS encrypted, but it's a software RAID1e with 3 x 2Tb WD RE4 drives. md1 is 400Gb Intel DC3700 drives in software RAID1. It's not using LUKS because the drives support encryption on-disk, so we're using that.
What do you use LUKS for? My best guess would be to make it easier to toss out broken drives without having to worry about personal data remaining on them?
So how do we structure our search? It's complicated. There are 4 "tiers" of storage. The first tier is tmpfs, the second is ssd (it's not used much though), the third is on the search partition, and the 4th is ALSO on the search partition, but it's there for archive purposes, so we can compact most of the long-term search down to a single index without having to rewrite it every week.
So you only use fast storage for writing? Isn't there a big performance hit for searches on the data and archive partitions? I wonder why you don't use SSDs for those.
Xapian supports reading from multiple databases. So the config on my server (we're moving to another machine here) is: search_engine: xapian search_index_headers: no search_batchsize: 8192 defaultpartition: default defaultsearchtier: temp tempsearchpartition-default: /var/run/cyrus/search-sloti30t01 metasearchpartition-default: /mnt/ssd30/sloti30t01/store23/search datasearchpartition-default: /mnt/i30search/sloti30t01/store23/search archivesearchpartition-default: /mnt/i30search/sloti30t01/store23/search-archive (layout is similar, but imap30 is a smaller machine, with just a single set) So by default it always indexes to temp, which gets us close-to-realtime indexing with a squatter that watches the sync_log directory for changes, and without causing too much random IO. Compress is run from cron: # Any time the disk gets over 50%, compress -o single down to data 13 * * * * /home/mod_perl/hm/scripts/xapian_compact.pl -a -o -d 50 temp data # Copy the temporary search databases down to data during the week 43 1 * * 1,2,3,4,5,6 /home/mod_perl/hm/scripts/xapian_compact.pl -a temp,meta data # Sundays repack the entire data directory with filtering of deleted # messages 43 1 * * 0 /home/mod_perl/hm/scripts/xapian_compact.pl -a -F temp,meta,data data I'll attach the xapian_compact.pl script to this email.
Why is there no job for archiving? You don't really do that manually, I suppose?
($Slot->RunCommand is pretty much system with a ton of magic around it) With this layout, we get a few different search indexes throughout the week, we check every hour that we don't waste too much memory on tmpfs, and we get IO efficiency with the compacts being in the quieter times. The xapian compact code in Cyrus does clever locking to allow it to compact all the existing databases while creating a brand new temp database to index new messages. [brong@imap30 hm]$ du -s /var/run/cyrus/search-sloti30t01/b/user/brong/* 79944 /var/run/cyrus/search-sloti30t01/b/user/brong/xapian.225 [brong@imap30 hm]$ du -s /mnt/i30search/sloti30t01/store23/search*/b/user/brong/* 1739980 /mnt/i30search/sloti30t01/store23/search-archive/b/user/brong/xapian 21516 /mnt/i30search/sloti30t01/store23/search-archive/b/user/brong/xapian.1 1392840 /mnt/i30search/sloti30t01/store23/search/b/user/brong/xapian.218 63676 /mnt/i30search/sloti30t01/store23/search/b/user/brong/xapian.219 385936 /mnt/i30search/sloti30t01/store23/search/b/user/brong/xapian.220 Wow, it looks like I'm due for an archiving! [brong@imap30 hm]$ sudo -u cyrus /usr/cyrus/bin/squatter -C /etc/cyrus/imapd-sloti30t01.conf -v -i -z archive -t temp,meta,data,archive -u brong compressing temp:225,archive:0,archive:1,data:218,data:219,data:220 to archive:2 for user.brong (active temp:225,archive:0,archive:1,data:218,data:219,data:220) adding new initial search location temp:226 compacting databases sloti30t01/squatter[2365398]: twoskip: checkpointed /mnt/i30search/sloti30t01/store23/search-archive/b/user/brong/xapian.2.NE W/cyrus.indexed.db (107 records, 17240 => 10600 bytes) in 0.003 seconds Compressing messages for brong done /mnt/i30search/sloti30t01/store23/search-archive/b/user/brong/xapian.2.NEW renaming tempdir into place finished compact of user.brong (active temp:226,archive:2) That took a few minutes, and now: [brong@imap30 hm]$ du -s /mnt/i30search/sloti30t01/store23/search*/b/user/brong/* 3365336 /mnt/i30search/sloti30t01/store23/search-archive/b/user/brong/xapian.2 [brong@imap30 hm]$ du -s /var/run/cyrus/search-sloti30t01/b/user/brong/* 168 /var/run/cyrus/search-sloti30t01/b/user/brong/xapian.226 I just have the one search index, nicely and efficiently compacted - plus a tiny new one with new messages being indexed.
Thanks Sebastian -- .:.Sebastian Hagedorn - Weyertal 121 (Gebäude 133), Zimmer 2.02.:. .:.Regionales Rechenzentrum (RRZK).:. .:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:.
Attachment:
p7sDgBO_HF3Hz.p7s
Description: S/MIME cryptographic signature
---- Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ To Unsubscribe: https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus