On Thu, Feb 15, 2018, at 10:44, Sebastian Hagedorn wrote:^Simon^: Is that the first 4Mb of the text/html and/or text/plain parts, or first 4Mb of the entire message body, ignoring any mime parts?This limit defines the maximum byte length per MIME body-part of type "text". The byte length is calculated after decoding (e.g. quoted-printable), conversion to UTF-8 and search text normalisation (e.g. stripping HTML tags, replacing Umlaut characters with their ASCII counterparts, etc.). Actually, it also applies to any other search-indexed fields, such as subjects, headers, etc. but in practice only is relevant for mail bodies.
Thanks. I suppose in practice that is good enough™️While we're at it, maybe you can answer some other questions regarding Xapian?
Is the setting "search_skipdiacrit" in imapd.conf honored during the indexing or is that only relevant while searching? Given your comment regarding search normalization above I take it Umlaut characters are not considered diacriticals? It's not a huge issue, but as a German university it would be nice for our users if a search could distinguish between "hatte" and "hätte", as an example.
Just out of curiosity, how is the mapping between a Xapian docid and a message file on disk achieved? I played around with xapian-delve and the Perl example simplesearch.pl. When I search a term, I get a list of docid's, but how do I know which message that is?
Cheers Sebastian -- .:.Sebastian Hagedorn - Weyertal 121 (Gebäude 133), Zimmer 2.02.:. .:.Regionales Rechenzentrum (RRZK).:. .:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:.
Attachment:
pgpDWKD3LrAT5.pgp
Description: PGP signature
---- Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ To Unsubscribe: https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus