Bryan Henderson wrote: > >- although that will cause unnecessary I/O barriers, one per > >fsync_range(). > > What do I/O barriers have to do with it? An I/O barrier says, "don't > harden later writes before these have hardened," whereas fsync_range() > says, "harden these writes now." Does Linux these days send an I/O > barrier to the block subsystem and/or device as part of fsync()? For better or worse, I/O barriers and I/O flushes are the same thing in the Linux block layer. I've argued for treating them distinctly, because there are different I/O scheduling opportunities around each of them, but there wasn't much interest. > Or are we talking about the command to the device to harden all earlier > writes (now) against a device power loss? Does fsync() do that? Ultimately that's what we're talking about, yes. Imho fsync() should do that, because a userspace database/filesystem should have access to the same integrity guarantees as an in-kernel filesystem. Linux fsync() doesn't always send the command - it's a bit unpredictable last time I looked. There are other opinions. MacOSX fsync() doesn't - because it has an fcntl() which is a stronger version of fsync() documented for that case. They preferred reduced integrity of fsync() to keep benchmarks on par with other OSes which don't send the command. Interestingly, Windows _does_ have the option to send the command to the device, controlled by userspace. If you set the Windows equivalents to O_DSYNC and O_DIRECT at the same time, then calls to the Windows equivalent to fdatasync() cause an I/O barrier command to be sent to the disk if necessary. The Windows documentation even explain the different between OS caching and device caching and when each one occurs, too. Wow - it looks like Windows (later versions) has the edge in doing the right thing here for quite some time... http://www.microsoft.com/sql/alwayson/storage-requirements.mspx http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlIObasics.mspx > Either way, I can see that multiple fsync_ranges's in a row would be a > little worse than just one, but it's pretty bad problem anyway, so I don't > know if you could tell the difference. A little? It's the difference between letting the disk schedule 100 scattered writes itself, and forcing the disk to write them in the order you sent them from userspace, aside from the doubling the rate of device commands... -- Jamie -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html