Hi,
On 03/04/2023 17:50, Michael Meeks wrote:
Hi there,
I've been looking at some memory profiles, and it seems to me that
we're storing too much bulk image data in memory. I understand that
previously we had real problems with image lifecycle and loss due to
optimistic attempts to use the original document as a reliable source
for fetching data from; so it made good sense to brute-force this by
holding all images in memory: fair enough - I've not seen us loose an
image while copy/pasting/closing/opening etc. since that was done -
which is awesome, and I don't want to go back to that =)
Yeah, it was never intended to stay that way, but the size of compressed
images are small enough usually to not cause any super high memory usage
(that is when running it locally on user's computers) so bringing
swapping to disk back wasn't that necessary as nobody complained and
worked generally way faster without doing unnecessary swap-outs and
swap-ins when loading a document. The biggest problem was how to prevent
that to happen - things like parallel loading that was introduced and
re-introduced did not help this - it made everything worse, which is
what I complained about it and provided a solution to how it should be
done properly, but yeah.... The issue is why load images in parallel
when you immediately notice that the image isn't used and swap it out
and even when you take this into account, the effect of parallel loading
is minimal because it's unlikely that many images will be used
immediately when loading. Better time would be spent to create a
parallel image loading worker that would load images on demand in a
separate thread - so we actually could load multiple images in parallel
all the time, not just when a document is loaded.
Nevertheless, the memory profiles show that for large documents
the compressed image data can also be large: hundreds of Mb (and yes
we have some customers that love copy/pasting lots of large complex
images into their files ;-)
Yeah - I would imagine PNG files could get super big, but mostly they
shouldn't be that big...? That's strange. MSO automatically enables
image re-compression on save, which drastically lowers the size of
images that are saved into the documents. Maybe we should consider the same.
Also it doesn't help that we load WebP images and convert those to PNG
images (which is something I complained about and wanted to fix it but
was demotivated by the resistance to my idea). A lot of images on the
web today are WebP.
Anyhow - I was doing some re-factoring of the BinaryDataContainer
to encapsulate it better; and I was thinking of adding the ability to
swap these out (which should be async), and to then read them back in
on demand which (I hope) in the era of SSDs - should be extremely
fast, although synchronous.
Then again - since I've not touched these code paths for a long
time; and lost any deep understanding I may once have had, I'd love
some advice:
https://git.libreoffice.org/core/+log/refs/heads/private/mmeeks/swapdatacontainer
Would be good to have two forms of that: advice for long-term
star-trek futures, and also advice for
quick-fix-that-goes-in-the-right-direction =)
It's tempting but not sure if swapping should be the responsibility of
BinaryDataContainer. BinaryDataContainer is copyable, but the internal
"std::vector<sal_uInt8>", which isn't - so you need to wrap that again
and put that into a shared_ptr. But then I would leave the choice of
what kind of data (swappable or not) to the user.
Beyond that - I'd -really- like to get more de-compressed images
out of memory; the 20 MPixel image that turns into 80Mb of RAM - and
then is interpolated down to some 1080p screen - so ~2Mpixel once,
then cached, and re-rendered from the cache. I wonder - will our
existing usage of eg. cairo's scaling cache allow us to push the giant
backing bitmap out of memory, and leave the 8Mb alternative in the
scaling cache ? ;-)
Well I think this is more useful than swapping (unless the compressed
image actually takes a huge amount of memory). I wouldn't do it on the
Cairo level but on the BitmapEx / Graphic level, because the
GraphicManager should have a say in this and it is independent of the
bitmap backend. Adding the mip-mapping part, which I have been talking
about for years so that 1. it would make scaling images faster because
you start with the nearest 2^n scaled down image and not the original
resolution, 2. would purge large original resolution images from the
memory and load it from the compressed source again only when required.
This can be made super smart to load and decompress the image on demand
and use a scaled down version in the meantime, and on load even use the
thumbnail that a lot of images have as the metadata - stuff like that.
Thoughts appreciated,
Michael.
Tomaž