2016-05-04 17:15+0000, Cao, Lei: > On 5/4/2016 9:13 AM, Radim Krčmář wrote: >> Good designs so far seem to be: >> memslot -> lockless radix tree >> and >> vcpu -> memslot -> list (memslot -> vcpu -> list) >> > > There is no need for lookup, the dirty log is fetched in sequence, so why use > radix tree with added complexity but no benefit? > > List can be designed to be lockless, so memslot -> lockless fixed list? It can, but lockless list for concurrent writers is harder than lockless list for a concurrent writer and reader. The difference is in starvation -- it's possible that VCPU would never get to write an entry unless you implemented a queueing mechanism. A queueing mechanism means that you basically have a spinlock, so I wouldn't bother with a lockless list and just try spinlock directly. A spinlock with very short critical section might actually work well for < 256 VCPU and is definitely the easiest option. Worth experimenting with, IMO. Lockless radix tree doesn't starve. Every entry has a well defined place in the tree. The entry just might not be fully allocated yet. If another VCPU is faster and expands the tree, then other VCPUs use that extended tree until they all get to their leaf nodes, VCPUs basically cooperate on growing the tree. And I completely forgot that we can preallocate the whole tree and use a effective packed storage thanks to that. My first guess is that it would be make sense with double the memory of our bitmap. Scans and insertion would be slower than for a per-vcpu list, but much faster than with a dynamically allocated structure. I'll think a bit about that. The main reason why I'd like something that can contain all dirty pages is overflow -- the userspace has to treat *all* pages as dirty if we lose a dirty page, so overflow must never happen -- we have to either grow the dirty log or suspend the writer until userspace frees space ... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html