Re: Call for GSoC/Outreachy internship project ideas

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 01 Feb 2024 10:57:00 PST (-0800), alex.bennee@xxxxxxxxxx wrote:
Palmer Dabbelt <palmer@xxxxxxxxxxx> writes:

On Thu, 01 Feb 2024 09:39:22 PST (-0800), alex.bennee@xxxxxxxxxx wrote:
Palmer Dabbelt <palmer@xxxxxxxxxxx> writes:

On Tue, 30 Jan 2024 12:28:27 PST (-0800), stefanha@xxxxxxxxx wrote:
On Tue, 30 Jan 2024 at 14:40, Palmer Dabbelt <palmer@xxxxxxxxxxx> wrote:

On Mon, 15 Jan 2024 08:32:59 PST (-0800), stefanha@xxxxxxxxx wrote:
> Dear QEMU and KVM communities,
> QEMU will apply for the Google Summer of Code and Outreachy internship
> programs again this year. Regular contributors can submit project
> ideas that they'd like to mentor by replying to this email before
> January 30th.

It's the 30th, sorry if this is late but I just saw it today.  +Alistair
and Daniel, as I didn't sync up with anyone about this so not sure if
someone else is looking already (we're not internally).
<snip>
Hi Palmer,
Performance optimization can be challenging for newcomers. I wouldn't
recommend it for a GSoC project unless you have time to seed the
project idea with specific optimizations to implement based on your
experience and profiling. That way the intern has a solid starting
point where they can have a few successes before venturing out to do
their own performance analysis.

Ya, I agree.  That's part of the reason why I wasn't sure if it's a
good idea.  At least for this one I think there should be some easy to
understand performance issue, as the loops that go very slowly consist
of a small number of instructions and go a lot slower.

I'm actually more worried about this running into a rabbit hole of
adding new TCG operations or even just having no well defined mappings
between RVV and AVX, those might make the project really hard.

You shouldn't have a hard guest-target mapping. But are you already
using the TCGVec types and they are not expanding to AVX when its
available?

Ya, sorry, I guess that was an odd way to describe it.  IIUC we're
doing sane stuff, it's just that RISC-V has a very different vector
masking model than other ISAs.  I just said AVX there because I only
care about the performance on Intel servers, since that's what I run
QEMU on.  I'd asssume we have similar performance problems on other
targets, I just haven't looked.

So my worry would be that the RVV things we're doing slowly just don't
have fast implementations via AVX and thus we run into some
intractable problems.  That sort of stuff can be really frusturating
for an intern, as everything's new to them so it can be hard to know
when something's an optimization dead end.

That said, we're seeing 100x slowdows in microbenchmarks and 10x
slowdowns in real code, so I think there sholud be some way to do
better.

It would be nice if you could convert that micro-benchmark to plain C
for a tcg/multiarch test case. It would be a useful tool for testing
changes.

Yep. I actually gave it a shot before posting the C++ version and it seems kind of fragile, just poking it boring looknig ways changes the behavior. Some of that was tied up in me trying to get GCC to generate similar code to clang, though, so hopefully that's all manageable. I certainly wouldn't want to throw something that wacky at an intern for their first project, though. So I don't have a good version yet.

I'm also hoping the fuzzer reproduces some nice small examples, but no luck yet...



Remember for anything float we will end up with softfloat anyway so we
can't use SIMD on the backend.

Yep, but we have a handful of integer slowdowns too so I think there's
some meat to chew on here.  The softfloat stuff should be equally slow
for scalar/vector, so we shouldn't be tripping false positives there.

Do you have the time to profile and add specifics to the project idea
by Feb 21st? If that sounds good to you, I'll add it to the project
ideas list and you can add more detailed tasks in the coming weeks.

I can at least dig up some of the examples I ran into, there's been a
handful filtering in over the last year or so.

This one
<https://gist.github.com/compnerd/daa7e68f7b4910cb6b27f856e6c2beba>
still has a much more than 10x slowdown (73ms -> 13s) with
vectorization, for example.

Thanks,
Stefan

-- Alex Bennée
Virtualisation Tech Lead @ Linaro

--
Alex Bennée
Virtualisation Tech Lead @ Linaro




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux