Re: builder io isue

Brendan Conoboy <blc@xxxxxxxxxx> · Sun, 25 Dec 2011 22:43:15 -0800

On 12/25/2011 09:06 PM, Gordan Bobic wrote:
Why not just mount direct via NFS? It'd be a lot quicker, not to mention
easier to tune. It'd work for building all but a handful of packages
(e.g. zsh), but you could handle that by having a single builder that
uses a normal fs that has a policy pointing the packages that fail
self-tests on NFS at it.

I'm not acquainted with the rationale for the decision so perhaps 
somebody else can comment.  Beyond the packages that demand a local 
filesystem, perhaps there were issues with .nfsXXX files, or some 
stability problem not seen when working with a single open file? Not sure.

512KB chunks sound vastly oversized for this sort of a workload. But if
you are running ext4 on top of loopback file on top of NFS, no wonder
the performance sucks.

Well, 512KB chunks is oversize for traditional NFS use, but perhaps 
undersized for this unusual use case.

Sounds like a better way to ensure that would be to re-architect the
storage solution more sensibly. If you really want to use block level
storage, use iSCSI on top of raw partitions. Providing those partitions
are suitably aligned (e.g. for 4KB physical sector disks, erase block
sizes, underlying RAID, etc.), your FS on top of those iSCSI exports
will also end up being properly aligned, and the stride, stripe-width
and block group size will all still line up properly.

I understand there was an issue with iSCSI stability about a year ago. 
One of our engineers tried it on his trimslice recently and had no 
problems so it may be time to reevaluate its use.

But with 40 builders, each builder only hammering one disk, you'll still
get 10 builders hammering each spindle and causing a purely random seek
pattern. I'd be shocked if you see any measurable improvement from just
splitting up the RAID.

Let's say 10 (40/4) builders are using one disk at the same time- that's 
not necessarily a doomsday scenario since their network speed is only 
100mbps.  The one situation you want to avoid is having numerous mock 
setups at one time, that will amount to a hammering.  How much time on 
average is spent composing the chroot vs building?  Sure, at some point 
builders will simply overwhelm any given disk, but what is that point? 
My guess is that 10 is really pushing it.  5 would be better.

Using the fs image over loopback over NFS sounds so eyewateringly wrong
that I'm just going to give up on this thread if that part is immutable.
I don't think the problem is significantly fixable if that approach
remains.

Why is that?

I don't see why you think that seeking within a single disk is any less
problematic than seeking across multiple disks. That will only happen
when the file exceeds the chunk size, and that will typically happen
only at the end when linking - there aren't many cases where a single
code file is bigger than a sensible chunk size (and in a 4-disk RAID0
case, you're pretty much forced to use a 32KB chunk size if you intend
for the block group beginnings to be distributed across spindles).

It's the chroot composition that makes me think seeking across multiple 
disks is an issue.

And local storage will be what? SD cards? There's only one model line of
SD cards I have seen to date that actually produce random-write results
that begin to approach a ~5000 rpm disk (up to 100 IOPS), and those are
SLC and quite expensive. Having spent the last few months patching,
fixing up and rebuilding RHEL6 packages for ARM, I have a pretty good
understanding of what works for backing storage and what doesn't - and
SD cards are not an approach to take if performance is an issue. Even
expensive, highly branded Class 10 SD cards only manage ~ 20 IOPS
(80KB/s) on random writes.

80KB/s? Really?  That sounds like bad alignment.

Strictly speaking, journal is about preventing the integrity of the FS
so you don't have to fsck it after an unclean shutdown, not about
preventing data loss as such. But I guess you could argue the two are
related.

Right, sorry, was still thinking of async.

I'm still not sure what is the point of using a loopback-ed file for
storage instead of raw NFS. NFS mounted with nolock,noatime,proto=udp
works exceedingly well for me with NFSv3.

I didn't think udp was a good idea any longer.

Well, deadline is about favouring reads over writes. Writes you can
buffer as long as you have RAM to spare (expecially with libeatmydata
LD_PRELOAD-ed). Reads, however, block everything until they complete. So
favouring reads over writes may well get you ahead in terms of keeping
he builders busy.

It really begs the question: What are builers blocking on right now? 
I'd assumed chroot composition which is rather write heavy.

--
Brendan Conoboy / Red Hat, Inc. / blc@xxxxxxxxxx
_______________________________________________
arm mailing list
arm@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/arm