On 12/25/2011 09:06 PM, Gordan Bobic wrote:
Why not just mount direct via NFS? It'd be a lot quicker, not to mention easier to tune. It'd work for building all but a handful of packages (e.g. zsh), but you could handle that by having a single builder that uses a normal fs that has a policy pointing the packages that fail self-tests on NFS at it.
I'm not acquainted with the rationale for the decision so perhaps somebody else can comment. Beyond the packages that demand a local filesystem, perhaps there were issues with .nfsXXX files, or some stability problem not seen when working with a single open file? Not sure.
512KB chunks sound vastly oversized for this sort of a workload. But if you are running ext4 on top of loopback file on top of NFS, no wonder the performance sucks.
Well, 512KB chunks is oversize for traditional NFS use, but perhaps undersized for this unusual use case.
Sounds like a better way to ensure that would be to re-architect the storage solution more sensibly. If you really want to use block level storage, use iSCSI on top of raw partitions. Providing those partitions are suitably aligned (e.g. for 4KB physical sector disks, erase block sizes, underlying RAID, etc.), your FS on top of those iSCSI exports will also end up being properly aligned, and the stride, stripe-width and block group size will all still line up properly.
I understand there was an issue with iSCSI stability about a year ago. One of our engineers tried it on his trimslice recently and had no problems so it may be time to reevaluate its use.
But with 40 builders, each builder only hammering one disk, you'll still get 10 builders hammering each spindle and causing a purely random seek pattern. I'd be shocked if you see any measurable improvement from just splitting up the RAID.
Let's say 10 (40/4) builders are using one disk at the same time- that's not necessarily a doomsday scenario since their network speed is only 100mbps. The one situation you want to avoid is having numerous mock setups at one time, that will amount to a hammering. How much time on average is spent composing the chroot vs building? Sure, at some point builders will simply overwhelm any given disk, but what is that point? My guess is that 10 is really pushing it. 5 would be better.
Using the fs image over loopback over NFS sounds so eyewateringly wrong that I'm just going to give up on this thread if that part is immutable. I don't think the problem is significantly fixable if that approach remains.
Why is that?
I don't see why you think that seeking within a single disk is any less problematic than seeking across multiple disks. That will only happen when the file exceeds the chunk size, and that will typically happen only at the end when linking - there aren't many cases where a single code file is bigger than a sensible chunk size (and in a 4-disk RAID0 case, you're pretty much forced to use a 32KB chunk size if you intend for the block group beginnings to be distributed across spindles).
It's the chroot composition that makes me think seeking across multiple disks is an issue.
And local storage will be what? SD cards? There's only one model line of SD cards I have seen to date that actually produce random-write results that begin to approach a ~5000 rpm disk (up to 100 IOPS), and those are SLC and quite expensive. Having spent the last few months patching, fixing up and rebuilding RHEL6 packages for ARM, I have a pretty good understanding of what works for backing storage and what doesn't - and SD cards are not an approach to take if performance is an issue. Even expensive, highly branded Class 10 SD cards only manage ~ 20 IOPS (80KB/s) on random writes.
80KB/s? Really? That sounds like bad alignment.
Strictly speaking, journal is about preventing the integrity of the FS so you don't have to fsck it after an unclean shutdown, not about preventing data loss as such. But I guess you could argue the two are related.
Right, sorry, was still thinking of async.
I'm still not sure what is the point of using a loopback-ed file for storage instead of raw NFS. NFS mounted with nolock,noatime,proto=udp works exceedingly well for me with NFSv3.
I didn't think udp was a good idea any longer.
Well, deadline is about favouring reads over writes. Writes you can buffer as long as you have RAM to spare (expecially with libeatmydata LD_PRELOAD-ed). Reads, however, block everything until they complete. So favouring reads over writes may well get you ahead in terms of keeping he builders busy.
It really begs the question: What are builers blocking on right now? I'd assumed chroot composition which is rather write heavy.
-- Brendan Conoboy / Red Hat, Inc. / blc@xxxxxxxxxx _______________________________________________ arm mailing list arm@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/arm