Re: armhf dnf is not working on aarch64 kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2016-04-28 22:26, Jon Masters wrote:
Hi Gordan,

On 04/28/2016 05:00 PM, Gordan Bobic wrote:
On 2016-04-28 19:49, Jon Masters wrote:

First of all, Jon, thank you for your thoughts on this matter.

No problem :)

Allow me to add a few thoughts. I have been working with the ARM vendors
(as well as the ARM Architecture Group) since before the architecture
was announced, and the issue of page size and 32-bit backward
compatibility came up in the earliest days. I am speaking from a Red Hat
perspective and NOT dictating what Fedora should or must do, but I do
strongly encourage Fedora not to make a change to something like the
page size simply to support a (relatively) small number of corner cases.

IMO, the issue of backward compatibility is completely secondary to
the issue of efficiency of memory fragmentation/occupancy when it comes
to 64KB pages. And that isn't a corner case, it is the overwhelmingly
primary case.

Let's keep to the memory discussion then, I agree. On the fragmentation
argument, I do agree this is an area where server/non-server uses
certainly clash. It might well be that we later decide in Fedora that 4K
is the right size once there are more 64-bit client devices.

As an additional factoid to throw into this, one obvious case where
large pages can be beneficial is databases. But speaking as a
database guy who measured the positive impact of using huge pages
on MySQL, I can confirm that the performance improvement arising
from putting the buffer pool into 1MB huge pages instead of 4KB
pages is in the 3% range. And that is when using 1MB pages instead
of 4KB pages. While I haven't measured it, it doesn't seem
unreasonable to extrapolate the following:

1) 4KB -> 64KB pages will make less difference than 4KB -> 1MB
pages in this case this use case that is supposed to be the
prime example where larger memory pages make a measurable difference.

2) Regardless of whether we use 4KB or 64KB standard pages,
we can still use huge pages anyway, further minimizing the
usefulness of the 64KB compromise.

Having an entire separate several ISAs just for the fairly nonexistent field of proprietary non-recompilable third party 32-bit apps doesn't really make
sense. Sure, running 32-bit via multilib is fun and all, but it's not
really something that is critical to using ARM systems.

Except where there's no choice, such as closed source applications
(Plex comes to mind) or libraries without appropriate ARM64 implementation
such as Mono. I'm sure pure aarch64 will be supported by it all at
some point, but the problem is real today.

It's definitely true that there are some applications that aren't yet
ported to ARMv8, though that list is fairly small (compared with IA32).

But OK, for the sake of this discussion let's completely ignore the
32-bit support to simplify things.

OK :)

The mandatory page sizes in the v8 architecture are 4K and 64K, with
various options around the number of bits used for address spaces, huge
pages (or ginormous pages), and contiguous hinting for smaller "huge"
pages. There is an option for 16K pages, but it is not mandatory. In the server specifications, we don't compel Operating Systems to use 64K, but everything is written with that explicitly in mind. By using 64K early we ensure that it is possible to do so in a very clean way, and then if (over the coming years) the deployment of sufficient real systems proves
that this was a premature decision, we still have 4K.

The real question is how much code will bit-rot due to not being
tested with 4KB pages

With respect, I think it's the other way around. We have another whole
architecture targeting 4K pages by default, and (regretfully perhaps,
though that's a personal opinion) it's a pretty popular choice that many
people are using in Fedora today. So I don't see any situation in which
4K bitrots over 64K. I did see the opposite being very likely if we
didn't start out with 64K as the baseline going in on day one.

Perhaps. Hopefully this won't be an issue at least as long as Fedora
ships both 32-bit and 64-bit ARM distros.

I also asked a few of the chip
vendors not to implement 32-bit execution (and some of them have indeed omitted it after we discussed the needs early on), and am aggressively
pushing for it to go away over time in all server parts. But there's
more to it than that. In the (very) many early conversations with
various performance folks, the feedback was that larger page sizes than 4K should generally be adopted for a new arch. Ideally that would have been 16K (which other architectures than x86 went with also), but that was optional. Optionally necessarily means "does not exist". My advice when Red Hat began internal work on ARMv8 was to listen to the experts.

Linus is not an expert?

Note that I never said he isn't an expert. He's one of the smartest guys
around, but he's not always right 100% of the time. Folks who run
performance numbers were consulted about the merits of 64K (as were a
number of chip architects) and they said that was the way to go. We can
always later decide (once there's a server market running fully) that
this was premature and change to 4K, but it's very hard to go the other
way around later if we settle for 4K on day one. The reason is 4K works
great out of the box as it's got 30 years of history on that other arch,
but for 64K we've only POWER to call on, and its userbase generally
aren't stressing the same workloads as on 64-bit ARM. Sometimes they
are, and that's been helpful with obscure things like emacs crashing due
to a page size assumption or two on arrow presses.

Indeed, but the POWER hardware also tends to be used in rather niche
cases, and probably more often with large databases than x86 or ARM.
And as I mentioned above, even on workloads like that, the page size
doesn't yield ground breaking performance improvements. Certainly nowhere
nearly enough improvement to offset the penalty of, say, the hypervisor
overhead.

I am well aware of Linus's views on the topic and I have seen the rants on G+ and elsewhere. I am completely willing to be wrong (there is not
enough data yet) over moving to 64K too soon and ultimately if it was
premature see things like RHELSA on the Red Hat side switch back to 4K.

My main concern is around how much code elsewhere will rot and need
attention should this ever happen.

I think, once again, that any concern over 4K being a well supported
page size is perhaps made moot by the billions of x86 systems out there
using that size. Most of the time, it's not the case that applications
have assembly code level changes required for 64K. Sure, the toolchain
will emit optimized code and it will use adrp and other stuff in v8 to
reference pages and offsets, but that compiler code works well. It's not
the piece that's got any potential for issue. It's the higher level C
code that possibly has assumptions to iron out on a 64K base vs 4K.

Indeed, the toolchain output is a concern - specifically anything that
would cause aarch64 binaries to run with 64KB kernels but not 4KB ones.
But I concede that at this stage such bugs are purely theoretical. I
have certainly not (yet?) found anything in an aarch64 distro that
breaks when I replace the kernel with one that uses 4KB pages.

Fedora is its own master, but I strongly encourage retaining the use of 64K granules at this time, and letting it play out without responding to
one or two corner use cases and changing course. There are very many
design optimizations that can be done when you have a 64K page size,
from the way one can optimize cache lookups and hardware page table
walker caches to the reduction of TLB pressure (though I accept that
huge pages are an answer for this under a 4K granule regime as well). It
would be nice to blaze a trail rather than take the safe default.

While I agree with the sentiment, I think something like this is
better decided on carefully considered merit assessed through
empirical measurement.

Sure. We had to start with something. Folks now have something that they
can use to run numbers on. BUT note that the kind of 64-bit hw that is
needed to really answer these questions is only just coming. Again, if
64K was a wrong choice, we can change it. It's only a mistake if we
always dogmatically stick to principle in the face of evidence to the
contrary. If the evidence says "dude, 64K was at best premature and
Linus was right", then that's totally cool with me. We'll meanwhile have
a codebase that is even more portable (different arch/pagesz).

Fair enough. I guess the next step would be to actually run some
numbers.

My own opinion is that (in the longer term, beginning with server) we
should not have a 32-bit legacy of the kind that x86 has to deal with
forever. We can use virtualization (and later, if it really comes to it, containers running 32-bit applications with 4K pages exposed to them - an implementation would be a bit like "Clear" containers today) to run 32-bit applications on 64-bit without having to do nasty hacks (such as
multilib) and reduce any potential for confusion on the part of users
(see also RasPi 3 as an example). It is still early enough in the
evolution of general purpose aarch64 to try this, and have the pragmatic
fallback of retreating to 4K if needed. The same approach of running
under virtualization or within a container model equally applies to
ILP32, which is another 32-bit ABI that some folks like, in that a third
party group is welcome to do all of the lifting required.

This again mashes 32-bit support with page size. If there is no
32-bit support in the CPU, I am reasonably confident that QEMU
emulation if it will be unusably slow for just about any serious
use case (you might as well run QEMU emulation of ARM32 on x86
in that case and not even touch upon aarch64).

Point noted. If we keep the conversation purely to the relative merits
of 64K vs 4K page size upon memory use overhead, fragmentation, and the
like, then the previous comment about getting numbers stands. This is
absolutely something we intend to gather within the perf team inside Red
Hat (and share in some form) as more hardware arrives that can be
realistically used to quantify the value. You're welcome to also run
numbers and show that there's a definite case for 4K over 64K.

Indeed I intend to, but in most cases getting real world data tu run
such numbers is non-trivial. Any real data large enough to produce
meaningful results tends to belong to clients, who by and large
run on x86 only. So right now the best I can offer is experience
that on database workloads huge pages outperform 4KB pages by very
low single figure % points.

It is therefore questionable how much difference using 64KB non-huge
pages might actually make in terms of performance, while increases
in memory fragmentation are reasonably well understood.

It strikes me that this is something better tested in a lab
rather than guinea-pigging the entire user base, most of
whom aren't fortunate enough to have machines with tons of
RAM to not care.

Gordan
_______________________________________________
arm mailing list
arm@xxxxxxxxxxxxxxxxxxxxxxx
http://lists.fedoraproject.org/admin/lists/arm@xxxxxxxxxxxxxxxxxxxxxxx




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM (Vger)]     [Linux ARM]     [ARM Kernel]     [Fedora User Discussion]     [Older Fedora Users Discussion]     [Fedora Advisory Board]     [Fedora Security]     [Fedora Maintainers]     [Fedora Devel Java]     [Fedora Legacy]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Mentors]     [Fedora Package Announce]     [Fedora Package Review]     [Fedora Music]     [Fedora Packaging]     [Centos]     [Fedora SELinux]     [Coolkey]     [Yum Users]     [Tux]     [Yosemite News]     [Linux Apps]     [KDE Users]     [Fedora Tools]     [Fedora Art]     [Fedora Docs]     [Asterisk PBX]

Powered by Linux