On Fri, Oct 28, 2016 at 01:12:34PM +1100, Nicholas Piggin wrote: > On Fri, 28 Oct 2016 08:42:44 +1100 > Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > As a rule, it helps the crc implementation if it can operate on as large a > > > chunk as possible (alignment, startup overhead, etc). So I did a quick hack > > > at getting XFS checksumming to feed crc32c() with larger chunks, by setting > > > the existing crc to 0 before running over the entire buffer. Together with > > > some small work on the powerpc crc implementation, crc drops below 0.1%. > > > > I wouldn't have expected reducing call numbers and small alignment > > changes to make that amount of difference given the amount of data > > we are actually checksumming. How much of that difference was due to > > the improved CRC implementation? > > Sorry, I should have been more clear about what was happening. Not enough > sleep. The larger sizes allows the vectorized crc implementation to kick in. Ah, ok. So it never gets out of the slow, branchy lead in/lead out code for the smaller chunks.Fair enough. For the verify side it probably doesn't matter than much - the latency of the initial memory fetches on the data to be verified are likely to be the dominant factor for performance... > > FWIW, can you provide some additional context by grabbing the log > > stats that tell us the load on the log that is generating this > > profile? A sample over a minute of a typical workload (with a > > corresponding CPU profile) would probably be sufficient. You can get > > them simply by zeroing the xfs stats via > > /proc/sys/fs/xfs/stats_clear at the start of the sample period and > > then dumping /proc/fs/xfs/stat at the end. > > Yeah I'll get some better information for you. > > > > I don't know if something like this would be acceptable? It's not pretty, > > > but I didn't see an easier way. > > > > ISTR we made the choice not to do that to avoid potential problems > > with potential race conditions and bugs (i.e. don't modify anything > > in objects on read access) but I can't point you at anything > > specific... > > Sounds pretty reasonable, especially for the verifiers. For the paths > that create/update the checksums (including this log checksum), it seems > like it should be less controversial. Yup. For the paths that update the checksum we have to have an exclusive lock on the object (and will always require that), so I can't see a problem with changing the update code to use this method. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html