On Mon, Oct 23, 2023 at 09:42:59AM +1100, Dave Chinner wrote: > On Fri, Oct 20, 2023 at 08:34:48AM -0700, Darrick J. Wong wrote: > > On Thu, Oct 19, 2023 at 11:06:42PM -0700, Christoph Hellwig wrote: > > > On Thu, Oct 19, 2023 at 01:04:11PM -0700, Darrick J. Wong wrote: > > > > Well... the stupid answer is that I augmented generic/176 to try to race > > > > buffered and direct reads with cloning a million extents and print out > > > > when the racing reads completed. On an unpatched kernel, the reads > > > > don't complete until the reflink does: > > > > > > > So as you can see, reads from the reflink source file no longer > > > > experience a giant latency spike. I also wrote an fstest to check this > > > > behavior; I'll attach it as a separate reply. > > > > > > Nice. I guess write latency doesn't really matter for this use > > > case? > > > > Nope -- they've gotten libvirt to tell qemu to redirect vm disk writes > > to a new sidecar file. Then they reflink the original source file to > > the backup file, but they want qemu to be able to service reads from > > that original source file while the reflink is ongoing. When the backup > > is done, they commit the sidecar contents back into the original image. > > > > It would be kinda neat if we had file range locks. Regular progress > > could shorten the range as it makes progress. If the thread doing the > > reflink could find out that another thread has blocked on part of the > > file range, it could even hurry up and clone that part so that neither > > reads nor writes would see enormous latency spikes. > > > > Even better, we could actually support concurrent reads and writes to > > the page cache as long as the ranges don't overlap. But that's all > > speculative until Dave dumps his old ranged lock patchset on the list. > > The unfortunate reality is that range locks as I was trying to > implement them didn't scale - it was a failed experiment. > > The issue is the internal tracking structure of a range lock. It has > to be concurrency safe itself, and even with lockless tree > structures using per-node seqlocks for internal sequencing, they > still rely on atomic ops for safe concurrent access and updates. > > Hence the best I could get out of an uncontended range lock (i.e. > locking different exclusive ranges concurrently) was about 400,000 > lock/unlock operations per second before the internal tracking > structure broke down under concurrent modification pressure. That > was a whole lot better than previous attempts that topped out at > ~150,000 lock/unlock ops/s, but it's still far short of the ~3 > million concurrent shared lock/unlock ops/s than a rwsem could do on > that same machine. > > Worse for range locks was that once passed peak performance, > internal contention within the range lock caused performance to fall > off a cliff and ends up being much worse than just using pure > exclusive locking with a mutex. > > Hence without some novel new internal lockless and memory allocation > free tracking structure and algorithm, range locks will suck for the > one thing we want them for: high performance, highly concurrent > access to discrete ranges of a single file. Ah. Thanks for the reminder about that. --D > -Dave. > > -- > Dave Chinner > david@xxxxxxxxxxxxx