On 5/5/2016 12:26 PM, Radim Krčmář wrote: > 2016-05-04 21:27+0200, Radim Krčmář: >> And I completely forgot that we can preallocate the whole tree and use a >> effective packed storage thanks to that. My first guess is that it >> would be make sense with double the memory of our bitmap. Scans and >> insertion would be slower than for a per-vcpu list, but much faster than >> with a dynamically allocated structure. I'll think a bit about that. > > The preallocate allocated radix tree has > * fast insertion (few atomic bit writes at places that are computed > only with arithmetics = no pointer traversal) > * feasible lockless removal (not sure how fast it will be, but it only > slows down the read side, and if we can guarantee that all vcpus are > stopped, then it's trivial) > * compatible with bitmap (a subset of the tree is 1:1 legacy bitmap) > * very fast search if pages are allocated in the same range (on par > with list) > * only ~2 times slower search that bitmap when the bitmap is full > * high memory consumption (roughly double the bitmap) > > The only reason why a dynamically allocated tree would be used instead > is unacceptable memory consumption, but we already lived with a bitmap, > so I think it is worth a try. > > I wrote a dirty, buggy an unoptimized userspace implementation. (Code > attached if you're interested.) Especially the bitmap is crude, so real > numbers will look differently, but asymptotic behaviour should remain. > > If we consider a slot with 10000000 pages, which is 38 GB of tracked > memory, and datasets from 100 to 10000000 random dirty frames, then > timing for codes that compact [l]ist, [t]ree and [b]itmap into a dirty > list, are: > > % ./radix_tree 10000000 100 > kB size l:0 t:1513 b:1220 > l: 642 ns (1.00, 78.05, 45977.38) > t: 50108 ns (0.01, 1.00, 589.08) > b: 29517475 ns (0.00, 0.00, 1.00) > > % ./radix_tree 10000000 1000 > l: 9542 ns (1.00, 43.29, 3347.11) > t: 413110 ns (0.02, 1.00, 77.31) > b: 31938143 ns (0.00, 0.01, 1.00) > > % ./radix_tree 10000000 10000 > l: 105859 ns (1.00, 21.07, 273.75) > t: 2230429 ns (0.05, 1.00, 12.99) > b: 28978510 ns (0.00, 0.08, 1.00) > > % ./radix_tree 10000000 100000 > l: 984729 ns (1.00, 12.16, 25.53) > t: 11970068 ns (0.08, 1.00, 2.10) > b: 25144204 ns (0.04, 0.48, 1.00) > > % ./radix_tree 10000000 1000000 > kB size l:7812 t:1513 b:1220 > l: 6420041 ns (1.00, 5.69, 4.13) > t: 36511925 ns (0.18, 1.00, 0.73) > b: 26523022 ns (0.24, 1.38, 1.00) > > % ./radix_tree 10000000 10000000 > kB size l:78125 t:1513 b:1220 > l: 38180627 ns (1.00, 1.56, 1.31) > t: 59681915 ns (0.64, 1.00, 0.84) > b: 50079550 ns (0.76, 1.19, 1.00) > > The tree is 589.08 times faster than bitmap with 100 "dirty pages" and > list is 78.05 times faster than the tree. Speedups get smaller as the > increases and the tree ends up performing worse than a bitmap. > > The tree performs much better when the dataset isn't random, i.e. dirty > frames are a seqence from 0 to the size of the dataset: > > % ./radix_tree 10000000 1000 seq > l: 8928 ns (1.00, 1.93, 3366.93) > t: 17198 ns (0.52, 1.00, 1747.88) > b: 30059972 ns (0.00, 0.00, 1.00) > > % ./radix_tree 10000000 10000 seq > l: 96469 ns (1.00, 1.36, 271.40) > t: 131124 ns (0.74, 1.00, 199.67) > b: 26181839 ns (0.00, 0.01, 1.00) > > % ./radix_tree 10000000 100000 seq > l: 1256119 ns (1.00, 1.28, 24.71) > t: 1609232 ns (0.78, 1.00, 19.29) > b: 31036003 ns (0.04, 0.05, 1.00) > Interesting. I'll do some prototyping with radix tree to see how it performs in our environment. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html