On 7/27/2012 3:14 AM, Jason Newton wrote: > raw video to disk (3 high res 10bit video streams, 5.7MB per frame, at 20hz > so effectively 60fps total). I use 2 512GB OCZ vertex 4 SSDs which > support ~450MB/s each. I've soft-raided them together (raid 0) with a 4k > chunksize and I get about 900MB/s avg in a benchmark program I wrote to > simulate my videostream logging needs. ... > I only have 50 milliseconds per frame and latencies exceeding this would > result in dropped frames (bad). ... max: 375 transferred 900.33G ... max: 438 transferred 192.12G ... max: 541 transferred 96.61G ... max: 50 transferred 19.42G ... max: 906 transferred 124.23G etc. > xfs_info of my video raid: > meta-data=/dev/md2 isize=256 agcount=32, agsize=7380047 > blks > = sectsz=512 attr=2 > data = bsize=4096 blocks=236161504, imaxpct=25 > = sunit=1 swidth=2 blks > naming =version 2 bsize=4096 ascii-ci=0 > log =internal bsize=4096 blocks=115313, version=2 > = sectsz=512 sunit=1 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > > I'm using 3.2.22 with the rt34 patchset. > > If it's desired I can post my benchmark code. I intend to rework it a > little so it only does 60fps capped since this is my real workload. > > If anyone has any tips for reducing latencies of the write calls or cpu > usage, I'd be interested for sure. I don't think your write latency problem is software related. What do you think the odds are that the wear leveling routine is kicking in and causing your half second max latencies? In one test you wrote over 90% of the user cells of the devices, and most of your test writes were over 100GB--10% of the user cells. That's an extremely large wear load for an SSD over a short period. What happens when you format each SSD directly and write to the two XFS filesystems, without md/RAID0, two streams to one SSD and one to the other? That'll free up serious cycles allowing you to eliminate CPU saturation. WRT CPU consumption, at these data rates, md/RAID0 is going to eat massive cycles, even though it is not bound by a single thread as are RAID1/10/5/6. A linear concat will eat the same as RAID0. The others would simply peak one core and scale no further. Both 0/linear are fully threaded and simply pass an offset to the block layer, so using an embedded CPU with more cores would help. One with a faster clock would as well obviously, but not as much as more cores. Interesting topic Jason. -- Stan _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs