On Tue, Aug 09, 2011 at 12:10:48PM +0200, Michael Monnerie <michael.monnerie@xxxxxxxxxxxxxxxxxxx> wrote: > First of all, please calm down. Getting personal is not bringing us > anywhere. Well, it's not me who's getting personal, so...? > > Logic error - if I can corrupt an XFS without special privileges then > > this is not a problem with xfs_fsr, but simply a kernel bug in the > > xfs code. And a rather big one, one step below a remote exploit. > > No, it's not a kernel bug because as long as you don't use xfs_fsr, > nothing will ever happen. "As long as you don't boot, it will not crash". xfs_fsr uses syscalls, just like other applications. According to your (wrong) logic, if an application uses chown and this causes a kernel oops, this is also not a kernel bug. Thats of course wrong - it's the kernel that crashes when an applicaiton does certain access patterns. > (rw,nodiratime,relatime,logbufs=8,logbsize=256k,attr2,barrier,largeio,swalloc) > and sometimes also > ,allocsize=64m As has been reported on this list, this option is really harmful on current xfs - in my case, it lead to xfs causing ENOSPC even when the disk was 40% empty (~188gb). > and I can't find evidence for fragmentation that would be harmful.Yes Well, define "harmful" - slow logfile reads aren't what I consider "harmful" either. It's just very very slow. > The allocsize option helps a lot there. I looked at one webserver access > log, it has 640MB with 99 fragments, but that's not a lot. On our > Spamgate I see 250MB logs with 374 fragments. Well, if it were one fragment, you could read that in 4-5 seconds, at 374 fragments, it's probably around 6-7 seconds. Thats not harmful, but if you extrapolate this to a few gigabytes and a lot of files, it becomes quite the overhead. > don't use the allocsize option there, which I changed now that I looked That allocsize option is no longer reasonable with newer kernels, as the kernel will reserve 64m diskspace even for 1kb files indefinitely. > > If XFS is bad at append-only workloads, which is the most common type > > of workload, then XFS fails to be very relevant for the real world. > > may be valid for your world, not mine. We have webservers, fileservers > and database servers, all of which are not really append style, but more > delete-and-recreate. If you find a way of recreating files without appending to them, let me know. The problem with fragmentatioon is that it happens even for a few writers for "create file" workloads (which do append...). You probably make a distinction between "writing a file fast" and "writing a file slow", but the distinction is not a qualitative difference. On busy servers thta create a lot of files, you get fragmentation the same way as on less busy servers that write files slower. There is little to no difference in the resulting patterns. > Well, db-servers are rather exceptional here. Yes, append style is what makes up for the vast majority of disk writes on a normal system, db-servers excepted indeed. > But if the numbers for fragmentation on your servers are true, you must > have a very good test case for fragmentation prevention. Therefore it > could be really interesting if you could grab what Dave Chinner asked > for: I'll keep it in mind. > And maybe he could use it for optimizations. Is there any tool on Linux > to record such I/O patterns? I presume strace would do, but thats where the "lot of work" comes in. If there is a ready-to-use tool, that would of course make it easy. -- The choice of a Deliantra, the free code+content MORPG -----==- _GNU_ http://www.deliantra.net ----==-- _ generation ---==---(_)__ __ ____ __ Marc Lehmann --==---/ / _ \/ // /\ \/ / schmorp@xxxxxxxxxx -=====/_/_//_/\_,_/ /_/\_\ _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs