Re: [HEADS UP] Removal of GCC from the buildroot

Daniel P. Berrangé <berrange@xxxxxxxxxx> · Mon, 16 Jul 2018 17:10:11 +0100

On Thu, Jul 12, 2018 at 09:17:41PM +0100, Richard W.M. Jones wrote:
> On Thu, Jul 12, 2018 at 02:10:37PM -0400, Cole Robinson wrote:
> > On 07/11/2018 04:37 PM, Kevin Fenzi wrote:
> > > On 07/11/2018 12:57 PM, Mikolaj Izdebski wrote:
> > >> On 07/11/2018 09:26 PM, Kevin Fenzi wrote:
> > >>> I don't see the cache=unsafe anywhere (although the name sure makes me
> > >>> want to enable it for official builds let me tell ya. ;) Can you point
> > >>> out more closely where it is or docs for it?
> > >>
> > >> cache=unsafe is documented at [1]. (Basically, in virt_install_command
> > >> you append ",cache=unsafe" to --disk parameter, next to "bus=virtio".)
> > >> It makes buildvmhost cache all disk operations and ignore sync
> > >> operations. Similar to nosync, but does not work on buildhw, works on
> > >> virthost level, applies to all operations, not just dnf.
> > >>
> > >> [1]
> > >> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/virtualization_tuning_and_optimization_guide/index#sect-Virtualization_Tuning_Optimization_Guide-BlockIO-Caching
> > > 
> > > Ah, I see at the vm level. Yeah, I don't think this would be very much
> > > of a win for us. The x86_64 buildvm's have all their storage on iscsi,
> > > the arm ones have their storage on ssd's. I suppose it could help the
> > > ppc64{le} ones, they are on 10k sas drives. I'm pretty leary of enabling
> > > anything called 'unsafe' though.
> > 
> > I think it's unsafe only in the case of on-disk consistency, so across
> > VM reboots. I _think_ over a single run of a VM it's safe, which may
> > describe koji usage.
> > 
> > I know rjones has looked deeply at qemu caching methods for use in
> > libguestfs so maybe he can comment, CC'd
> 
> I cover caching modes about half way down here:
> 
>   https://rwmj.wordpress.com/2013/09/02/new-in-libguestfs-allow-cache-mode-to-be-selected/
> 
> First off, cache=unsafe really does improve performance greatly, I
> measured around 25% on a disk-heavy workload.

FYI to augment what Rich's blog post says, it helps to understand the
difference between cache modes. The QEMU 'cache' setting actually
controls 3 separate tunables under the hood:

              │ cache.writeback   cache.direct   cache.no-flush
 ─────────────┼─────────────────────────────────────────────────
 writeback    │ on                off            off
 none         │ on                on             off
 writethrough │ off               off            off
 directsync   │ off               on             off
 unsafe       │ on                off            on

IOW, changing from cache=none to cache=unsafe turns off O_DIRECT so data
is buffered in host RAM, and also turns off disk flushing, so QEMU never
requests it to be pushed out to disk. The latter change is what makes
it so catastrophic on host failure - even a journalling filesystem in
the guest won't save you because we're ignoring the flush requests that
are required to make the journal work safely.

The combination of not using O_DIRECT and not honouring flush requests
means that all I/O operations on the guest complete pretty much immediately
without ever waiting for the host todo the real I/O.

The amount of RAM you have in the host though is pretty relevant here.
If the guest is doing I/O faster than the host OS can write it to disk
and there's never any flush requests to slow the guest down, you're
going to use an ever increasing amount of host RAM for caching I/O.
This could be a bad thing if you're contending on host RAM - it could
even push other important guests out to swap or trigger OOM killer.

IOW, using O_DIRECT (cache=none or directsync) is a good thing if you
need predictable host RAM usage - the only RAM used for I/O cache is
that assigned to the guest OS itself.

With using cache=unsafe for Koji I'd be a little concerned about
whether a build could inflict a denial of service on host RAM either
intentionally or accidentally, as the guest is relatively untrustworthy
and/or unconstrained in what it is running.

Finally the issue of O_DIRECT vs host page cache *only* applies if your
QEMU process is using locally exposed storage. ie a plain file, or a
local device node in /dev.  If QEMU is using iSCSI via its built-in
network client, then host page cache vs O_DIRECT is irrelevant. In
this latter case, using cache=unsafe might be OK from a host RAM
consumption POV - though I'm not entirely sure what the RAM usage
pattern of the QEMU iSCSI client is like.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx/message/VBXYD4TEOSOYH2YUKYB6W67Z4ZWZK72P/