Dear Rob, I had reservations about some of these things too. :( In particular, I was wondering about having to remember and recreate the exact transfer-encoding. If both of us forward the same attachment in two emails, and one encodes in quoted-printable, the other in base64, Cyrus had better be able to recreate them exactly or have some other workarounds. I wasn't aware of the mmap() usage and the direct seeking into the middle of the message body. But the bigger problem is what you've described about reproducing the message byte-identically. If that can be solved, then we can make Cyrus re-create the message while loading from disk and stick it into RAM. Can we just brainstorm with you and others in this thread... how do we re-create a byte-identical attachment from a disk file? What is the list of attributes we will need to store per stripped attachment to allow an exact re-creation? - file name/reference - full MIME header of the attachment block - separator string (this will be retained in the message body anyway) - transfer encoding - if encoding = base64 then base64 line length - checksum of encoded attachment (as a sanity check in case the re-encoding fails to recreate exactly the same image as the original) If encoding = quoted-printable or uuencode, then don't strip the attachment at all. What other conditions may we need to look for to bypass attachment stripping? Can we just tap into all of you to get the ideas on paper, even if it's not being implemented by anyone right now? It'll at least help us understand the system's internals better. thanks a lot, and regards, Shuvam > cyrus likes to mmap the whole file so it can just offset into it to > extract which ever part is requested. In IMAP, you can request any > arbitrary byte range from the raw RFC822 message using the > body[]<start.length> construct, so you have to be able to byte > accurately reconstruct the original email if you remove attachments. > > Consider the problem of transfer encoding. Say you have a base64 > encoded attachment (which basically all are). When storing and > deduping, you'd want to base64 decode it to get the underlying > binary data. But depending on the line length of the base64 encoded > data, the same file can be encoded in a large number of different > ways. When you reconstruct the base64 data, you have to be byte > accurate in your reconstruction so your offsets are correct, and so > any signing of the message (eg DKIM) isn't broken. > > Once you've solved those problems, the rest is pretty straight forward :) > > Rob > ---- Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/