Re: Truncated text during Xapian indexing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



--On 15. Februar 2018 um 11:20:32 +0100 Robert Stepanek <rsto@xxxxxxxxxxxxxxxx> wrote:

On Thu, Feb 15, 2018, at 10:44, Sebastian Hagedorn wrote:

^Simon^: Is that the first 4Mb of the text/html and/or text/plain parts,
or  first 4Mb of the entire message body, ignoring any mime parts?

This limit defines the maximum byte length per MIME body-part of type
"text". The byte length is calculated after decoding (e.g.
quoted-printable), conversion to UTF-8 and search text normalisation
(e.g. stripping HTML tags, replacing Umlaut characters with their ASCII
counterparts, etc.). Actually, it also applies to any other
search-indexed fields, such as subjects, headers, etc. but  in practice
only is relevant for mail bodies.

Thanks. I suppose in practice that is good enough™️

While we're at it, maybe you can answer some other questions regarding Xapian?

Is the setting "search_skipdiacrit" in imapd.conf honored during the indexing or is that only relevant while searching? Given your comment regarding search normalization above I take it Umlaut characters are not considered diacriticals? It's not a huge issue, but as a German university it would be nice for our users if a search could distinguish between "hatte" and "hätte", as an example.

Just out of curiosity, how is the mapping between a Xapian docid and a message file on disk achieved? I played around with xapian-delve and the Perl example simplesearch.pl. When I search a term, I get a list of docid's, but how do I know which message that is?

Cheers
Sebastian
--
   .:.Sebastian Hagedorn - Weyertal 121 (Gebäude 133), Zimmer 2.02.:.
                .:.Regionales Rechenzentrum (RRZK).:.
  .:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:.

Attachment: pgpDWKD3LrAT5.pgp
Description: PGP signature

----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

[Index of Archives]     [Cyrus SASL]     [Squirrel Mail]     [Asterisk PBX]     [Video For Linux]     [Photo]     [Yosemite News]     [gtk]     [KDE]     [Gimp on Windows]     [Steve's Art]

  Powered by Linux