Re: builder io isue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/25/2011 03:47 AM, Gordan Bobic wrote:
On 12/25/2011 06:16 AM, Brendan Conoboy wrote:
Allocating builders to individual rather than a single raid volume will
help dramatically.
Care to explain why?

Sure, see below.

Is this a "proper" SAN or just another Linux box with some disks in it?
Is NFS backed by a SAN "volume"?

As I understand it, the server is a Linux host using raid0 with 512k chunks across 4 sata drives. This md device is then formatted with some filesystem (ext4?). Directories on this filesystem are then exported to individual builders such that each builder has its own private space. These private directories contain a large file that is used as a loopback ext4fs (IE, the builder mounts the nfs share, then loopback mounts the file on that nfs share as an ext4fs). This is where /var/lib/mock comes from. Just to be clear, if you looked at nfs mounted directory on a build host you would see a single large file that represented a filesystem, making traditional ext?fs tuning a bit more complicated.

The structural complication is that we have something like 30-40 systems all vying for the attention of those 4 spindles. It's really important that each builder not cause more than one disk to perform an operation because seeks are costly, and if just 2 disks get called up by a single builder, 50% of the storage resources will be taken up by a single host until the operation completes. With 40 hosts, you'll just end up thrashing (with considerably fewer hosts, too).. Raid0 gives great throughput, but it's at the cost of latency. With so many 100mbit builders, throughput is less important and latency is key.

Roughly put, the two goals for good performance in this scenario are:

1. Make sure each builder only activates one disk per operation.

2. Make sure each io operation causes the minimum amount of seeking.

You're right that good alignment and block sizes and whatnot will help this cause, but there is still greater likelihood of io operations traversing spindle boundaries periodically in the best situation. You'd need a chunk size about equal to the fs image file size to pull that off. Perhaps an lvm setup with strictly defined layouts with each lvcreate would make it a bit more manageable, but for simplicity's sake I advocate simply treating the 4 disks like 4 disks, exported according to expected usage patterns.

In the end, if all this is done and the builders are delayed by deep sleeping nfsds, the only options are to move /var/lib/mock to local storage or increase the number of spindles on the server.

Disable fs
journaling (normally dangerous, but this is throw-away space).

Not really dangerous - the only danger is that you might have to wait
for fsck to do it's thing on an unclean shutdown (which can take hours
on a full TB scale disk, granted).

I mean dangerous in the sense that if the server goes down, there might be data loss, but the builders using the space won't know that. This is particularly true if nfs exports are async.

Speaking of "dangerous" tweaks, you could LD_PRELOAD libeatmydata (add
to a profile.d file in the mock config, and add the package to the
buildsys-build group). That will eat fsync() calls which will smooth out
commits and make a substantial difference to performance. Since it's
scratch space anyway it doesn't matter WRT safety.

Sounds good to me :-)

Build of zsh will break on NFS whateveryou do. It will also break on a
local FS with noatime. There may be other packages that suffer from this
issue but I don't recall them off the top of my head. Anyway, that is an
issue for a build policy - have one builder using block level storage
with atime and the rest on NFS.

Since loopback files representing filesystems are being used with nfs as the storage mechanism, this would probably be a non-issue. You just can't have the builder mount its loopback fs noatime (hadn't thought of that previously).

Once all that is done, tweak the number of nfsds such that
there are as many as possible without most of them going into deep
sleep. Perhaps somebody else can suggest some optimal sysctl and ext4fs
settings?

As mentioned in a previous post, have a look here:
http://www.altechnative.net/?p=96

Deadline scheduler might also help on the NAS/SAN end, plus all the
usual tweaks (e.g. make sure write caches on the disks are enabled, if
the disks support write-read-verify disable it, etc.)

Definitely worth testing.  Well ordered IO is critical here.

--
Brendan Conoboy / Red Hat, Inc. / blc@xxxxxxxxxx
_______________________________________________
arm mailing list
arm@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/arm



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM (Vger)]     [Linux ARM]     [ARM Kernel]     [Fedora User Discussion]     [Older Fedora Users Discussion]     [Fedora Advisory Board]     [Fedora Security]     [Fedora Maintainers]     [Fedora Devel Java]     [Fedora Legacy]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Mentors]     [Fedora Package Announce]     [Fedora Package Review]     [Fedora Music]     [Fedora Packaging]     [Centos]     [Fedora SELinux]     [Coolkey]     [Yum Users]     [Tux]     [Yosemite News]     [Linux Apps]     [KDE Users]     [Fedora Tools]     [Fedora Art]     [Fedora Docs]     [Asterisk PBX]

Powered by Linux