Re: [PATCH v3 6/6] RISC-V: Do not use cpumask data structure for hartid bitmap

Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> · Mon, 31 Jan 2022 14:27:50 +0100

Hi Anup,

On Mon, Jan 31, 2022 at 1:09 PM Anup Patel <anup@xxxxxxxxxxxxxx> wrote:
> On Fri, Jan 28, 2022 at 2:09 PM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
> > On Fri, Jan 28, 2022 at 1:13 AM Atish Patra <atishp@xxxxxxxxxxxxxx> wrote:
> > > On Thu, Jan 27, 2022 at 12:48 AM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
> > >> On Thu, Jan 27, 2022 at 2:02 AM Atish Patra <atishp@xxxxxxxxxxxxxx> wrote:
> > >> > On Wed, Jan 26, 2022 at 1:10 AM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
> > >> > > On Wed, Jan 26, 2022 at 9:28 AM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
> > >> > > > On Wed, Jan 26, 2022 at 3:21 AM Atish Patra <atishp@xxxxxxxxxxxxxx> wrote:
> > >> > > > > On Tue, Jan 25, 2022 at 2:26 PM Jessica Clarke <jrtc27@xxxxxxxxxx> wrote:
> > >> > > > > > On 20 Jan 2022, at 09:09, Atish Patra <atishp@xxxxxxxxxxxx> wrote:
> > >> > > > > > > Currently, SBI APIs accept a hartmask that is generated from struct
> > >> > > > > > > cpumask. Cpumask data structure can hold upto NR_CPUs value. Thus, it
> > >> > > > > > > is not the correct data structure for hartids as it can be higher
> > >> > > > > > > than NR_CPUs for platforms with sparse or discontguous hartids.
> > >> > > > > > >
> > >> > > > > > > Remove all association between hartid mask and struct cpumask.
> > >> > > > > > >
> > >> > > > > > > Reviewed-by: Anup Patel <anup@xxxxxxxxxxxxxx> (For Linux RISC-V changes)
> > >> > > > > > > Acked-by: Anup Patel <anup@xxxxxxxxxxxxxx> (For KVM RISC-V changes)
> > >> > > > > > > Signed-off-by: Atish Patra <atishp@xxxxxxxxxxxx>
> > >> > > >
> > >> > > > > I am yet to reproduce it on my end.
> > >> > > > > @Geert Uytterhoeven: can you please try the below diff on your end.
> > >> > > >
> > >> > > > Unfortunately it doesn't fix the issue for me.
> > >> > > >
> > >> > > > /me debugging...
> > >> > >
> > >> > > Found it: after this commit, the SBI_EXT_RFENCE_REMOTE_FENCE_I and
> > >> > > SBI_EXT_RFENCE_REMOTE_SFENCE_VMA ecalls are now called with
> > >> > > hmask = 0x8000000000000001 and hbase = 1 instead of hmask = 3 and
> > >> > > hbase = 0.
> > >> > >
> > >> > > cpuid 1 maps to  hartid 0
> > >> > > cpuid 0 maps to hartid 1
> > >> > >
> > >> > >     __sbi_rfence_v02:364: cpuid 1 hartid 0
> > >> > >     __sbi_rfence_v02:377: hartid 0 hbase 1
> > >> > >     hmask |= 1UL << (hartid - hbase);
> > >> > >
> > >> > > oops
> > >> > >
> > >> > >     __sbi_rfence_v02_call:303: SBI_EXT_RFENCE_REMOTE_FENCE_I hmask
> > >> > > 8000000000000001 hbase 1
> > >> > >
> > >> >
> > >> > Ahh yes. hmask will be incorrect if the bootcpu(cpu 0) is a higher
> > >> > hartid and it is trying to do a remote tlb flush/IPI
> > >> > to lower the hartid. We should generate the hartid array before the loop.
> > >> >
> > >> > Can you try this diff ? It seems to work for me during multiple boot
> > >> > cycle on the unleashed.
> > >> >
> > >> > You can find the patch here as well
> > >> > https://github.com/atishp04/linux/commits/v5.17-rc1
> >
> > >> > @@ -345,13 +368,21 @@ static int __sbi_rfence_v02(int fid, const
> > >> > struct cpumask *cpu_mask,
> > >> >       unsigned long arg4, unsigned long arg5)
> > >> >  {
> > >> >   unsigned long hartid, cpuid, hmask = 0, hbase = 0;
> > >> > - int result;
> > >> > + int result, index = 0, max_index = 0;
> > >> > + unsigned long hartid_arr[NR_CPUS] = {0};
> > >>
> > >> That's up to 256 bytes on the stack. And more if the maximum
> > >> number of cores is increased.
> > >>
> > >
> > > Yeah. We can switch to dynamic allocation using kmalloc based on
> > > the number of bits set in the cpumask.
> >
> > Even more overhead...
> >
> > >> > - if (!cpu_mask)
> > >> > + if (!cpu_mask || cpumask_empty(cpu_mask))
> > >> >   cpu_mask = cpu_online_mask;
> > >> >
> > >> >   for_each_cpu(cpuid, cpu_mask) {
> > >> >   hartid = cpuid_to_hartid_map(cpuid);
> > >> > + hartid_arr[index] = hartid;
> > >> > + index++;
> > >> > + }
> > >> > + max_index = index;
> > >> > + sort(hartid_arr, max_index, sizeof(unsigned long), cmp_ulong, NULL);
> > >> > + for (index = 0; index < max_index; index++) {
> > >> > + hartid = hartid_arr[index];
> > >>
> > >> That looks expensive to me.
> > >>
> > >> What about shifting hmask and adjusting hbase if a hartid is
> > >> lower than the current hbase?
> > >
> > > That will probably work for current systems but it will fail when we have hartid > 64.
> > > The below logic as it assumes that the hartids are in order. We can have a situation
> > > where a two consecutive cpuid belong to hartids that require two invocations of sbi call
> > > because the number of harts exceeds BITS_PER_LONG.
> >
> > If the number of harts exceeds BITS_PER_LONG, you always need multiple
> > calls, right?
> >
> > I think the below (gmail-whitespace-damaged diff) should work:

[...]

> >
> > Another simpler solution would be to just round hbase down to a
> > multiple of 32/64 (gmail-whitespace-damaged diff):

[...]

> > But that means multiple SBI calls if you have e.g. hartids 1-64.
> > The shifted mask solution doesn't suffer from that.
> > Both solutions don't sort the CPUs, so they are suboptimal in case of
> > hartid numberings like 0, 64, 1, 65, ...
>
> In most cases, the hartids will be in sorted order under /cpus DT node
> but it is not guaranteed that boot_cpu will be having smallest hartid
>
> This means hartid numbering will be something like:
> 0, 1, 2, .....,
> 64, 0, 1, 2, ....
> 31, 0, 1, 2, .....
>
> >
> > What do you think?
>
> Assuming hartids under /cpus DT node are ordered, I think your
> approach will only have one additional SBI call compared to Atish's
> approach but Atish's approach will require more memory with
> increasing NR_CPUS so I suggest we go with your approach.
>
> Can you send a patch with your approach ?

Sure, done.
https://lore.kernel.org/r/cover.1643635156.git.geert@xxxxxxxxxxxxxx/

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds