Re: [PATCH 3/6] KVM: Dirty memory tracking for performant checkpointing and improved live migration

"Cao, Lei" <Lei.Cao@xxxxxxxxxxx> · Fri, 6 May 2016 15:19:46 +0000



On 5/5/2016 12:26 PM, Radim Krčmář wrote:
> 2016-05-04 21:27+0200, Radim Krčmář:
>> And I completely forgot that we can preallocate the whole tree and use a
>> effective packed storage thanks to that.  My first guess is that it
>> would be make sense with double the memory of our bitmap.  Scans and
>> insertion would be slower than for a per-vcpu list, but much faster than
>> with a dynamically allocated structure.  I'll think a bit about that.
> 
> The preallocate allocated radix tree has
>  * fast insertion (few atomic bit writes at places that are computed
>    only with arithmetics = no pointer traversal)
>  * feasible lockless removal (not sure how fast it will be, but it only
>    slows down the read side, and if we can guarantee that all vcpus are
>    stopped, then it's trivial)
>  * compatible with bitmap (a subset of the tree is 1:1 legacy bitmap)
>  * very fast search if pages are allocated in the same range (on par
>    with list)
>  * only ~2 times slower search that bitmap when the bitmap is full
>  * high memory consumption (roughly double the bitmap)
> 
> The only reason why a dynamically allocated tree would be used instead
> is unacceptable memory consumption, but we already lived with a bitmap,
> so I think it is worth a try.
> 
> I wrote a dirty, buggy an unoptimized userspace implementation.  (Code
> attached if you're interested.) Especially the bitmap is crude, so real
> numbers will look differently, but asymptotic behaviour should remain.
> 
> If we consider a slot with 10000000 pages, which is 38 GB of tracked
> memory, and datasets from 100 to 10000000 random dirty frames, then
> timing for codes that compact [l]ist, [t]ree and [b]itmap into a dirty
> list, are:
> 
>   % ./radix_tree 10000000 100
>   kB size l:0 t:1513 b:1220
>   l:        642 ns (1.00,  78.05, 45977.38)
>   t:      50108 ns (0.01,   1.00,   589.08)
>   b:   29517475 ns (0.00,   0.00,     1.00)
> 
>   % ./radix_tree 10000000 1000
>   l:       9542 ns (1.00,  43.29,  3347.11)
>   t:     413110 ns (0.02,   1.00,    77.31)
>   b:   31938143 ns (0.00,   0.01,     1.00)
> 
>   % ./radix_tree 10000000 10000
>   l:     105859 ns (1.00,  21.07,   273.75)
>   t:    2230429 ns (0.05,   1.00,    12.99)
>   b:   28978510 ns (0.00,   0.08,     1.00)
> 
>   % ./radix_tree 10000000 100000
>   l:     984729 ns (1.00,  12.16,    25.53)
>   t:   11970068 ns (0.08,   1.00,     2.10)
>   b:   25144204 ns (0.04,   0.48,     1.00)
> 
>   % ./radix_tree 10000000 1000000
>   kB size l:7812 t:1513 b:1220
>   l:    6420041 ns (1.00,   5.69,     4.13)
>   t:   36511925 ns (0.18,   1.00,     0.73)
>   b:   26523022 ns (0.24,   1.38,     1.00)
> 
>   % ./radix_tree 10000000 10000000
>   kB size l:78125 t:1513 b:1220
>   l:   38180627 ns (1.00,   1.56,     1.31)
>   t:   59681915 ns (0.64,   1.00,     0.84)
>   b:   50079550 ns (0.76,   1.19,     1.00)
> 
> The tree is 589.08 times faster than bitmap with 100 "dirty pages" and
> list is 78.05 times faster than the tree.  Speedups get smaller as the
> increases and the tree ends up performing worse than a bitmap.
> 
> The tree performs much better when the dataset isn't random, i.e. dirty
> frames are a seqence from 0 to the size of the dataset:
> 
>   % ./radix_tree 10000000 1000 seq
>   l:       8928 ns (1.00,   1.93,  3366.93)
>   t:      17198 ns (0.52,   1.00,  1747.88)
>   b:   30059972 ns (0.00,   0.00,     1.00)
> 
>   % ./radix_tree 10000000 10000 seq
>   l:      96469 ns (1.00,   1.36,   271.40)
>   t:     131124 ns (0.74,   1.00,   199.67)
>   b:   26181839 ns (0.00,   0.01,     1.00)
> 
>   % ./radix_tree 10000000 100000 seq
>   l:    1256119 ns (1.00,   1.28,    24.71)
>   t:    1609232 ns (0.78,   1.00,    19.29)
>   b:   31036003 ns (0.04,   0.05,     1.00)
> 

Interesting. I'll do some prototyping with radix tree to see how it
performs in our environment.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html