Re: How to debug intermittent increasing md/inflight but no disk activity?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Andre, dear Dave,


Thank you for your replies.


Am 11.07.24 um 13:23 schrieb Andre Noll:
On Thu, Jul 11, 09:12, Dave Chinner wrote

Of course it’s not reproducible, but any insight how to debug this next time
is much welcomed.

Probably not a lot you can do short of reconfiguring your RAID6
storage devices to handle small IOs better. However, in general,
RAID6 /always sucks/ for small IOs, and the only way to fix this
problem is to use high performance SSDs to give you a massive excess
of write bandwidth to burn on write amplification....

FWIW, our approach to mitigate the write amplification suckage of large
HDD-backed raid6 arrays for small I/Os is to set up a bcache device
by combining such arrays with two small SSDs (configured as raid1).

Now that file servers with software RAID proliferate in our institute due to old systems with battery backed hardware RAID controllers are taken offline, we noticed performance problems. (We still have not found the silver bullet yet.) My colleague Donald was testing bcache in March, but due to the slightly more complex setup, a colleague is currently experimenting with a write journal for the software RAID.


Kind regards,

Paul


PS: *bcache* performance test:

time bash -c '(cd /jbod/MG002/scratch/x && for i in $(seq -w 1000); do echo a > data.$i; done)'

| setting                                | time/s  | time/s  | time/s |
|----------------------------------------|---------|---------|--------|
| xfs/raid6                              | 40.826 | 41.638 | 44.685 |
| bcache/xfs/raid6 mode none             | 32.642 | 29.274 | 27.491 |
| bcache/xfs/raid6 mode writethrough     | 27.028 | 31.754 | 28.884 |
| bache/xfs/raid6 mode writearound       | 24.526 | 30.808 | 28.940 |
| bcache/xfs/raid6 mode writeback        |  5.795 |  6.456 |  7.230 |
| bcachefs 10+2                          | 10,321 | 11,832 | 12,671 |
| bcachefs 10+2+nvme (writeback)         |  9.026 |  8.676 |  8.619 |
| xfs/raid6 (12*100GB)                   | 32.446 | 25.583 | 24.007 |
| xfs/raid5 (12*100GB)                   | 27.934 | 23.705 | 22.558 |
| xfs/bcache(10*raid6,2*raid1 cache) writethrough | 56.240 | 47.997 | 45.321 |
| xfs/bcache(10*raid6,2*raid1 cache) writeback  | 82.230 | 85.779 | 85.814 |
| xfs/bcache(10*raid6,2*raid1 cache(ssd)) writethrough | 26.459 | 23.631 | 23.586 | | xfs/bcache(10*raid6,2*raid1 cache(ssd)) writeback | 7.729 | 7.073 | 6.958 |
| as above with sequential_cutoff=0      |  6.397 |  6.826 |  6.759 |

`sequential_cutoff=0` significantly speeds up the `tar xf node-v20.11.0.tar.gz` from 13m45.108s to 5m31.379s ! Maybe the sequential cutoff thing doesn't work well over nfs.

1.  Build kernel over NFS with the usual setup: 27m38s
2.  Build kernel over NFS with xfs+bcache with two (raid1) SSDs: 10m27s




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux