Re: BinaryDataContainer swapping ? ...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 03/04/2023 17:50, Michael Meeks wrote:
Hi there,

    I've been looking at some memory profiles, and it seems to me that we're storing too much bulk image data in memory. I understand that previously we had real problems with image lifecycle and loss due to optimistic attempts to use the original document as a reliable source for fetching data from; so it made good sense to brute-force this by holding all images in memory: fair enough - I've not seen us loose an image while copy/pasting/closing/opening etc. since that was done - which is awesome, and I don't want to go back to that =)


Yeah, it was never intended to stay that way, but the size of compressed images are small enough usually to not cause any super high memory usage (that is when running it locally on user's computers) so bringing swapping to disk back wasn't that necessary as nobody complained and worked generally way faster without doing unnecessary swap-outs and swap-ins when loading a document. The biggest problem was how to prevent that to happen - things like parallel loading that was introduced and re-introduced did not help this - it made everything worse, which is what I complained about it and provided a solution to how it should be done properly, but yeah.... The issue is why load images in parallel when you immediately notice that the image isn't used and swap it out and even when you take this into account, the effect of parallel loading is minimal because it's unlikely that many images will be used immediately when loading. Better time would be spent to create a parallel image loading worker that would load images on demand in a separate thread - so we actually could load multiple images in parallel all the time, not just when a document is loaded.



    Nevertheless, the memory profiles show that for large documents the compressed image data can also be large: hundreds of Mb (and yes we have some customers that love copy/pasting lots of large complex images into their files ;-)


Yeah - I would imagine PNG files could get super big, but mostly they shouldn't be that big...? That's strange. MSO automatically enables image re-compression on save, which drastically lowers the size of images that are saved into the documents. Maybe we should consider the same.

Also it doesn't help that we load WebP images and convert those to PNG images (which is something I complained about and wanted to fix it but was demotivated by the resistance to my idea). A lot of images on the web today are WebP.

    Anyhow - I was doing some re-factoring of the BinaryDataContainer to encapsulate it better; and I was thinking of adding the ability to swap these out (which should be async), and to then read them back in on demand which (I hope) in the era of SSDs - should be extremely fast, although synchronous.

    Then again - since I've not touched these code paths for a long time; and lost any deep understanding I may once have had, I'd love some advice:

https://git.libreoffice.org/core/+log/refs/heads/private/mmeeks/swapdatacontainer

    Would be good to have two forms of that: advice for long-term star-trek futures, and also advice for quick-fix-that-goes-in-the-right-direction =)


It's tempting but not sure if swapping should be the responsibility of BinaryDataContainer. BinaryDataContainer is copyable, but the internal "std::vector<sal_uInt8>", which isn't - so you need to wrap that again and put that into a shared_ptr. But then I would leave the choice of what kind of data (swappable or not) to the user.



    Beyond that - I'd -really- like to get more de-compressed images out of memory; the 20 MPixel image that turns into 80Mb of RAM - and then is interpolated down to some 1080p screen - so ~2Mpixel once, then cached, and re-rendered from the cache. I wonder - will our existing usage of eg. cairo's scaling cache allow us to push the giant backing bitmap out of memory, and leave the 8Mb alternative in the scaling cache ? ;-)


Well I think this is more useful than swapping (unless the compressed image actually takes a huge amount of memory). I wouldn't do it on the Cairo level but on the BitmapEx / Graphic level, because the GraphicManager should have a say in this and it is independent of the bitmap backend. Adding the mip-mapping part, which I have been talking about for years so that 1. it would make scaling images faster because you start with the nearest 2^n scaled down image and not the original resolution, 2. would purge large original resolution images  from the memory and load it from the compressed source again only when required. This can be made super smart to load and decompress the image on demand and use a scaled down version in the meantime, and on load even use the thumbnail that a lot of images have as the metadata - stuff like that.


Thoughts appreciated,

        Michael.


Tomaž




[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux