I was thinking a little color commentary might be helpful from a perspective of what the functionally is that's driving the need for fallocate. I think I mentioned somewhere in this thread that the application is OpenStack Swift, which is a highly scalable cloud object store. If you're not familiar with it, it doesn't do successive sequential writes to a preallocated file but rather writes out a full object in one shot. In other words, object = file. The whole purpose of preallocation, at least my understanding of it, is to make sure there is enough room when the time comes to write the actual object so if there isn't, a redundant server elsewhere can do it instead. This then makes the notion of speculative preallocation for future sequential writes moot, the ideal being to only preallocate the object size with minimal extra I/O. Does that help?
-mark
On Sat, Jun 15, 2013 at 6:35 AM, Mark Seger <mjseger@xxxxxxxxx> wrote:
Basically everything do it with collectl, a tool I wrote and opensourced almost 10 years ago. it's numbers are very accurate - I've compared with iostat on numerous occasions whenever I might have had doubts and they always agree. Since both tools get their data from the same place, /proc/diskstats, it's hard for them not to agree AND its numbers also agree with /proc/fs/xfs.Here's an example of comparing the two on a short run, leaving off the -m since collectl reports its output in KB.Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %utilsdc 0.00 0.00 0.00 494.00 0.00 126464.00 512.00 0.11 0.22 0.00 0.22 0.22 11.00
# <---------reads---------><---------writes---------><--------averages--------> Pct#Time Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize QLen Wait SvcTim Util10:18:32 sdc1 0 0 0 0 127488 0 498 256 256 1 0 0 710:18:33 sdc1 0 0 0 0 118784 0 464 256 256 1 0 0 4
for grins I also ran a set of numbers at a monitoring interval of 0.2 seconds just to see if they were steady and they are:# <---------reads---------><---------writes---------><--------averages--------> Pct#Time Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize QLen Wait SvcTim Util10:19:50.601 sdc1 0 0 0 0 768 0 3 256 256 0 0 0 0
10:19:50.801 sdc1 0 0 0 0 23296 0 91 256 256 1 0 0 1910:19:51.001 sdc1 0 0 0 0 32256 0 126 256 256 1 0 0 1410:19:51.201 sdc1 0 0 0 0 29696 0 116 256 256 1 0 0 1910:19:51.401 sdc1 0 0 0 0 30464 0 119 256 256 1 0 0 410:19:51.601 sdc1 0 0 0 0 32768 0 128 256 256 1 0 0 14but back to the problem at hand and that's the question why is this happening?To restate what's going on, I have a very simple script that I'm duplicating what openstack swift is doing, namely to create a file with mkstmp and than running an falloc against it. The files are being created with a size of zero but it seems that xfs is generating a ton of logging activity. I had read your posted back in 2011 about speculative preallocation and can't help but wonder if that's what hitting me here. I also saw where system memory can come into play and this box has 192GB and 12 hyperthreaded cores.I also tried one more run without falloc, this is creating 10000 1K files, which should be about 10MB and it looks like it's still doing 140MB of I/O which still feels like a lot but at least it's less than the# <---------reads---------><---------writes---------><--------averages--------> Pct#Time Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize QLen Wait SvcTim Util10:29:20 sdc1 0 0 0 0 89608 0 351 255 255 1 0 0 1110:29:21 sdc1 0 0 0 0 55296 0 216 256 256 1 0 0 5and to repeat the full run with falloc:# DISK STATISTICS (/sec)# <---------reads---------><---------writes---------><--------averages--------> Pct#Time Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize QLen Wait SvcTim Util10:30:50 sdc1 0 0 0 0 56064 0 219 256 256 1 0 0 210:30:51 sdc1 0 0 0 0 409720 148 1622 253 252 1 0 0 2610:30:52 sdc1 0 0 0 0 453240 144 1796 252 252 1 0 0 3610:30:53 sdc1 0 0 0 0 441768 298 1800 245 245 1 0 0 3710:30:54 sdc1 0 0 0 0 455576 144 1813 251 251 1 0 0 2510:30:55 sdc1 0 0 0 0 453532 145 1805 251 251 1 0 0 3510:30:56 sdc1 0 0 0 0 307352 145 1233 249 249 1 0 0 1710:30:57 sdc1 0 0 0 0 0 0 0 0 0 0 0 0 0If there is anything more I can provide I'll be happy to do so. Actually I should point out I can easily generate graphs and if you'd like to see some examples I can provide those too. Also, if there is anything I can report from /proc/fs/xfs I can relatively easily do that as well and display it side by side with the disk I/O.-markOn Fri, Jun 14, 2013 at 10:04 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
On Fri, Jun 14, 2013 at 09:55:17PM -0400, Mark Seger wrote:Where are you getting your IO throughput numbers from?
> I'm doing 1 second samples and the rates are very steady. The reason I
> ended up at this level of testing was I had done a sustained test for 2
> minutes at about 5MB/sec and was seeing over 500MB/sec going to the disk,
> again sampling at 1-second intervals. I'd be happy to provide detailed
> output and can even sample more frequently if you like.
How do they compare to, say, the output of `iostat -d -x -m 1`?
Cheers,
Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
_______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs