On 8/24/2018 2:20 PM, Duy Nguyen wrote:
On Fri, Aug 24, 2018 at 5:37 PM Duy Nguyen <pclouds@xxxxxxxxx> wrote:
On Thu, Aug 23, 2018 at 10:36 PM Ben Peart <peartben@xxxxxxxxx> wrote:
Nice to see this done without a new index extension that records
offsets, so that we can load existing index files in parallel.
Yes, I prefer this simpler model as well. I wasn't sure it would
produce a significant improvement given the primary thread still has to
run through the variable length cache entries but was pleasantly surprised.
Out of curiosity, how much time saving could we gain by recording
offsets as an extension (I assume we need, like 4 offsets if the
system has 4 cores)? Much much more than this simpler model (which may
justify the complexity) or just "meh" compared to this?
To answer my own question, I ran a patched git to precalculate
individual thread parameters, removed the scheduler code and hard
coded these parameters (I ran just 4 threads, one per core). I got
0m2.949s (webkit.git, 275k files, 100 read-cache runs). Compared to
0m4.996s from Ben's patch (same test settings of course) I think it's
definitely worth adding some extra complexity.
I took a run at doing that last year [1] but that was before the
mem_pool work that allowed us to avoid the thread contention on the heap
so the numbers aren't an apples to apples comparison (they would be
better today).
The trade-off is the additional complexity to be able to load the index
extension without having to parse through all the variable length cache
entries. My patch worked but there was feedback requested to make it
more generic and robust that I haven't gotten around to yet.
This patch series went for simplicity over absolutely the best possible
performance.
[1]
https://public-inbox.org/git/20171109141737.47976-1-benpeart@xxxxxxxxxxxxx/