Dear Andre, dear Dave,
Thank you for your replies.
Am 11.07.24 um 13:23 schrieb Andre Noll:
On Thu, Jul 11, 09:12, Dave Chinner wrote
Of course it’s not reproducible, but any insight how to debug this next time
is much welcomed.
Probably not a lot you can do short of reconfiguring your RAID6
storage devices to handle small IOs better. However, in general,
RAID6 /always sucks/ for small IOs, and the only way to fix this
problem is to use high performance SSDs to give you a massive excess
of write bandwidth to burn on write amplification....
FWIW, our approach to mitigate the write amplification suckage of large
HDD-backed raid6 arrays for small I/Os is to set up a bcache device
by combining such arrays with two small SSDs (configured as raid1).
Now that file servers with software RAID proliferate in our institute
due to old systems with battery backed hardware RAID controllers are
taken offline, we noticed performance problems. (We still have not found
the silver bullet yet.) My colleague Donald was testing bcache in March,
but due to the slightly more complex setup, a colleague is currently
experimenting with a write journal for the software RAID.
Kind regards,
Paul
PS: *bcache* performance test:
time bash -c '(cd /jbod/MG002/scratch/x && for i in $(seq -w 1000);
do echo a > data.$i; done)'
| setting | time/s | time/s | time/s |
|----------------------------------------|---------|---------|--------|
| xfs/raid6 | 40.826 | 41.638 | 44.685 |
| bcache/xfs/raid6 mode none | 32.642 | 29.274 | 27.491 |
| bcache/xfs/raid6 mode writethrough | 27.028 | 31.754 | 28.884 |
| bache/xfs/raid6 mode writearound | 24.526 | 30.808 | 28.940 |
| bcache/xfs/raid6 mode writeback | 5.795 | 6.456 | 7.230 |
| bcachefs 10+2 | 10,321 | 11,832 | 12,671 |
| bcachefs 10+2+nvme (writeback) | 9.026 | 8.676 | 8.619 |
| xfs/raid6 (12*100GB) | 32.446 | 25.583 | 24.007 |
| xfs/raid5 (12*100GB) | 27.934 | 23.705 | 22.558 |
| xfs/bcache(10*raid6,2*raid1 cache) writethrough | 56.240 | 47.997 |
45.321 |
| xfs/bcache(10*raid6,2*raid1 cache) writeback | 82.230 | 85.779 | 85.814 |
| xfs/bcache(10*raid6,2*raid1 cache(ssd)) writethrough | 26.459 | 23.631
| 23.586 |
| xfs/bcache(10*raid6,2*raid1 cache(ssd)) writeback | 7.729 | 7.073 |
6.958 |
| as above with sequential_cutoff=0 | 6.397 | 6.826 | 6.759 |
`sequential_cutoff=0` significantly speeds up the `tar xf
node-v20.11.0.tar.gz` from 13m45.108s to 5m31.379s ! Maybe the
sequential cutoff thing doesn't work well over nfs.
1. Build kernel over NFS with the usual setup: 27m38s
2. Build kernel over NFS with xfs+bcache with two (raid1) SSDs: 10m27s