squatter lower limits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello.

While looking to do low-level disk usage optimization, some simple performance tests relied on full-text searches (2.4 branch). Metadata always resides on local disks, while messages are on slower hardware.

I noticed that full-text searches with short strings take much longer than longer text. For example, a FT search on 3 letters takes >60" while a 9-letter long string on the same corpus lasts ~20". These tests have been repeated over and over again to exclude disk caching being the culprit: reversing the search order - longer first - has no impact.

So I opened up the cyrus source code and looked for search-related code. As I understand it, squatter is not used if the search string is shorter than 4 symbols. From squat.h it's quite clear:

/*
Don't change this unless you're SURE you know what you're doing.
Its only effect on the API is that searches for strings that are
shorter than SQUAT_WORD_SIZE are not allowed.
In SQUAT, a 'word' simply refers to a string of SQUAT_WORD_SIZE
arbitrary bytes.
*/

#define SQUAT_WORD_SIZE 4

So, question to who knows the squatter implementation in cyrus: is this lower limit applied to all searches? Body, subject, addresse(s)?

And, does this lower bound still apply to 3.0 branch and the new indexing engine Xapian?

Let alone low level disk compression or optimization, a client might not handle well long search times without receiving data on the IMAP channel and dismiss the connection (or a network device could do it). So, if searching for short strings means reading all raw message files, I should warn users through the client interface of possible failures since the mail corpus keeps growing and growing and growing. That's until we upgrade to 3.0, it that helps.

Thanks,

Paolo
----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus



[Index of Archives]     [Cyrus SASL]     [Squirrel Mail]     [Asterisk PBX]     [Video For Linux]     [Photo]     [Yosemite News]     [gtk]     [KDE]     [Gimp on Windows]     [Steve's Art]

  Powered by Linux