On Thu, Nov 03, 2016 at 09:04:42AM -0700, L.A. Walsh wrote: > > > Dave Chinner wrote: > > > >As most users never have things go wrong, all they think is "CRCs > >are unnecessary overhead". It's just like backups - how many people > >don't make backups because they cost money right now and there's no > >tangible benefit until something goes wrong which almost never > >happens? > ---- > But it's not like backups. You can't run a util > program upon discovering bad CRC's that will fix the file system > because the file system is no longer usable. xfs_repair will fix it, just like it will fix the same corruption on non-CRC filesystems. > >Exactly my point. Humans are terrible at risk assessment and > >mitigation because most people are unaware of the unconcious > >cognitive biases that affect this sort of decision making. > --- > My risk is near 0 since my file systems are monitored > by a raid controller with read patrols made over the data on > a period basis. If I had a dollar for every time someone said "hardware raid protects me" I'd have retired years ago. Media scrubbing does not protect against misdirected writes, corruptions to/from the storage, memory errors, software bugs, bad compilers (yes, we've already had XFS CRCs find a compiler bug), etc. > I'll assert that the chance of data randomly > going corrupt is much higher because there is ALOT more data than > metadata. On top of that, because I keep backups, my risk, is > at worst, the same without crc's as with them. The /scale of disaster/ for metadata corruption is far higher than for file data - a single bit error can trash the entire filesystem. You may not care about this, but plenty of other XFS users do. > i.e. the finobt provides more > >deterministic inode allocation overhead, not "faster" allocation. > > > >Let me demonstrate with some numbers on empty filesystem create > >rate: > > > > create rate sys CPU time write rate > > (files/s) (seconds) (MB/s) > >crc = 0, finobt = 0: 238943 2629 ~200 > >crc = 1, finobt = 0: 231582 2711 ~40 > >crc = 1, finobt = 1: 232563 2766 ~40 > >*hacked* crc disable: 231435 2789 ~40 > > > >We can see that the system CPU time increased by 3.1% with the > >"addition of CRCs". The CPU usage increases by a further 2% with > >the addition of the free inode btree, > --- > On an empty file system or older ones that are >50% > used? > > It's *nice* to be able to benchmarks, but not allowing > crc to be disabled, disables that possibility -- and that's > sorta the point. If you want to reproduce the above numbers, the script is below. You don't need the "CRC disable" hack to test whether CRCs have overhead or not, CPU profiles are sufficient for that. But, really, I don't care about whether you can reproduce these tests, because microbenchmarks don't matter to production systems. That is,, you haven't provided any numbers to back up your assertions that CRCs have excessive overhead for /your workload/. For me to care about what you are saying, you need to demonstrate a performance degradation between v4 and v5 filesystem formats for /your workloads/. I can't do this for you. I don't know what your workload is, nor what hardware you use. *Give me numbers* that I can work with - something we can measure and fix. You need to do the work to show there's an issue - I can't do that for you, and no amount of demanding that I do will change that. > >IOWs, for most workloads CRCs have no impact on filesystem > >performance. > --- > Too bad no one can test such the effect on their > own workloads, though if not doing crc's takes more CPU, then > it sounds like an algorithm problem: crc calculations don't > take "negative time", and a benchmark showing they do indicates > something else is causing the slowdown. I'm guessing that you aren't aware of how memory access patterns affect performance on modern CPUs. i.e. sequential memory access to a structure is much faster than random meory access becase the hardware prefetchers detect the sequential accesses and minimises cache miss latency. e.g. for a typical 4k btree block, doing a binary search on a cache cold block requires 10-12 cache misses for complete search. However, if we run a CRC check on it, we take a couple of cache misses before the hardware prefetcher kicks in then it's just CPU time to run the calc. Then the binary search doesn't have a cache miss at all. Hence if the CRC calc is fast enough (and for h/w accel it is fast enough) adding CRCs will make the code run faster.... This is actually a well known behaviour of modern CPUs - for years we've been using memset() to initialise structures when it's technically not necessary because it's the fastest way to prime the CPU caches for upcoming accesses to that structure. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx #!/bin/bash QUOTA= MKFSOPTS= NFILES=100000 DEV=/dev/vdc LOGBSIZE=256k while [ $# -gt 0 ]; do case "$1" in -q) QUOTA="uquota,gquota,pquota" ;; -N) NFILES=$2 ; shift ;; -d) DEV=$2 ; shift ;; -l) LOGBSIZE=$2; shift ;; --) shift ; break ;; esac shift done MKFSOPTS="$MKFSOPTS $*" echo QUOTA=$QUOTA echo MKFSOPTS=$MKFSOPTS echo DEV=$DEV sudo umount /mnt/scratch > /dev/null 2>&1 sudo mkfs.xfs -f $MKFSOPTS $DEV sudo mount -o nobarrier,logbsize=$LOGBSIZE,$QUOTA $DEV /mnt/scratch sudo chmod 777 /mnt/scratch cd /home/dave/src/fs_mark-3.3/ sudo sh -c "echo 1 > /proc/sys/fs/xfs/stats_clear" time ./fs_mark -D 10000 -S0 -n $NFILES -s 0 -L 32 \ -d /mnt/scratch/0 -d /mnt/scratch/1 \ -d /mnt/scratch/2 -d /mnt/scratch/3 \ -d /mnt/scratch/4 -d /mnt/scratch/5 \ -d /mnt/scratch/6 -d /mnt/scratch/7 \ -d /mnt/scratch/8 -d /mnt/scratch/9 \ -d /mnt/scratch/10 -d /mnt/scratch/11 \ -d /mnt/scratch/12 -d /mnt/scratch/13 \ -d /mnt/scratch/14 -d /mnt/scratch/15 \ | tee >(stats --trim-outliers | tail -1 1>&2) sync sudo umount /mnt/scratch -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html