On Sun, Jun 16, 2013 at 09:11:29PM -0400, Nathan Scott wrote: > Hey guys, > > ----- Original Message ----- > > ok, I have a simple reproducer. try out the following, noting you'll > > obviously have to change the directory pointed to by dname: > > > > libc=ctypes.CDLL(ctypes.util.find_library('c'), use_errno=True) > > falloc=getattr(libc, 'fallocate') > > > > This is using the glibc fallocate wrapper - I have vague memories of an > old libc which used to do per-page buffered writes providing a poor-mans > implementation of fallocate, maybe somehow that older version/behaviour > is being triggered. > > Running the test case on a RHEL6 box here, you should see patterns like > the attached ("pmchart -c XFSLog" - config attached too), which suggest > log traffic dominates (though I have no stripe-fu setup like you, Mark, > which adds another wrinkle). Must be an old version of RHEL6, because 6.4 doesn't do any IO at all, same as upstream. This test workload is purely a metadata only workload (no data is written) and so it all gets gathered up by delayed logging. And that's something 2.6.38 (and RHEL6.0/6.1) doesn't have by default, and so is going to write a fair bit of metadata to the log. But I wouldn't have expected one IO per fallocate call. Oh, we fixed this in 2.6.39: 8287889 xfs: preallocation transactions do not need to be synchronous So, fallocate() is synchronous in 2.6.38 (and probably RHEL 6.0/6.1) and the filesystem has a log stripe unit of 256k, so that would explain the 256k IO per fallocate call - the log is forced and so the ~500 bytes of dirty metadata gets padded to the full log stripe (i.e. 256k) and written synchronously. So there's the reason for the 256k write per file being written by swift. Have I mentioned anything about weird side effects occurring as a result of trying to emulate direct IO before? :) > > > On Sat, Jun 15, 2013 at 12:22:35PM -0400, Mark Seger wrote: > > > > I was thinking a little color commentary might be helpful from a > > > > perspective of what the functionally is that's driving the need for > > > > fallocate. I think I mentioned somewhere in this thread that the > > > > application is OpenStack Swift, which is a highly scalable cloud object > > > > store. > > > > > > I'm familiar with it and the problems it causes filesystems. What > > > application am I talking about here, for example? > > > > > > http://oss.sgi.com/pipermail/xfs/2013-June/027159.html > > > > > > Basically, Swift is trying to emulate Direct IO because python > > > does't support Direct IO. Hence Swift is hacking around that problem > > I think it is still possible, FWIW. One could use python ctypes (as in > Marks test program) and achieve a page-aligned POSIX memalign, I wasn't aware you could get memalign() through python at all. I went looking for this exact solution a couple of month ago when these problems started to be reported and couldn't find anything related to direct IO on python with google except for "it can't be done", "it doesn't work" and a patch that was rejected years ago to support it natively. > and some > quick googling suggests flags can be passed to open(2) via os.O_DIRECT. Yup, the python manual that documents this kind of thing is I'd expect to show up as the number one hit when you google "python O_DIRECT open flags", wouldn't you think? All I get with that is "O_DIRECT doesn't work" bug reports and blog posts. If drop the O_DIRECT out of the search phrase, and the first post is the python documentation about open flags and it documents that O_DIRECT can be passed. And if I use different search phrases for memalign without mentioning direct IO, I see lots of tricks people use to get this functionality on python. <sigh> Google has been letting me down like this quite a bit over the past few months when it comes to searching for stuff related to development. It's getting harder to find stuff amongst in the noise of whiny blogs, forums, and other places where people do nothing but complain about broken shite that google seems to think is more important than a real reference manual on the topic being searched. Is there anything better out there yet? Like from years ago when the google "I'm felling lucky" button used to pass you directly to the exact page of the reference manual relevant to the topic being searched? Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs