Thx Catalin, On Fri, Jun 21, 2019 at 10:16 PM Catalin Marinas <catalin.marinas@xxxxxxx> wrote: > > On Wed, Jun 19, 2019 at 07:51:03PM +0800, Guo Ren wrote: > > On Wed, Jun 19, 2019 at 4:54 PM Julien Grall <julien.grall@xxxxxxx> wrote: > > > On 6/19/19 9:07 AM, Guo Ren wrote: > > > > Move arm asid allocator code in a generic one is a agood idea, I've > > > > made a patchset for C-SKY and test is on processing, See: > > > > https://lore.kernel.org/linux-csky/1560930553-26502-1-git-send-email-guoren@xxxxxxxxxx/ > > > > > > > > If you plan to seperate it into generic one, I could co-work with you. > > > > > > Was the ASID allocator work out of box on C-Sky? > > > > Almost done, but one question: > > arm64 remove the code in switch_mm: > > cpumask_clear_cpu(cpu, mm_cpumask(prev)); > > cpumask_set_cpu(cpu, mm_cpumask(next)); > > > > Why? Although arm64 cache operations could affect all harts with CTC > > method of interconnect, I think we should keep these code for > > primitive integrity in linux. Because cpu_bitmap is in mm_struct > > instead of mm->context. > > We didn't have a use for this in the arm64 code, so no point in > maintaining the mm_cpumask. On some arm32 systems (ARMv6) with no > hardware broadcast of some TLB/cache operations, we use it to track > where the task has run to issue IPI for TLB invalidation or some > deferred I-cache invalidation. The operation of set/clear mm_cpumask was removed in arm64 compared to arm32. It seems no side effect on current arm64 system, but from software meaning it's wrong. I think we should keep mm_cpumask just like arm32. > > (there was also a potential optimisation on arm64 to avoid broadcast > TLBI if the task only ran on a single CPU but Will found that was rarely > the case on an SMP system because of rebalancing happening during > execve(), ending up with two bits set in the mm_cpumask) > > The way you use it on csky is different from how it is done on arm. It > seems to clear the mask for the scheduled out (prev) task but this > wouldn't work on arm(64) since the TLB still contains prev entries > tagged with the scheduled out ASID. Whether it matters, I guess it > depends on the specifics of your hardware. Sorry for the mistake quote, what I mean is what is done in arm32: clear all bits of mm->cpu_mask in new_context(), and set back one by one. Here is my patch: https://lore.kernel.org/linux-csky/CAJF2gTQ0xQtQY1t-g9FgWaxfDXppMkFooCQzTFy7+ouwUfyA6w@xxxxxxxxxxxxxx/T/#m2ed464d2dfb45ac6f5547fb3936adf2da456cb65 > > While the algorithm may seem fairly generic, the semantics have a few > corner cases specific to each architecture. See [1] for a description of > the semantics we need on arm64 (CnP is a feature where the hardware > threads of the same core can share the TLB; the original algorithm > violated the requirements when this feature was enabled). C-SKY SMP is only one hart per core, but here is a patch [1] with my thought on SMT duplicate tlb flush: [1] https://lore.kernel.org/linux-csky/1561305869-18872-1-git-send-email-guoren@xxxxxxxxxx/T/#u For TLA+ model, I still need some learning before I could talk with you. > > BTW, if you find the algorithm fairly straightforward ;), see this > bug-fix which took a formal model to identify: a8ffaaa060b8 ("arm64: > asid: Do not replace active_asids if already 0"). I think it's one fo the cases that other archs also could get benefit from arm's asid allocator code. Btw, Is this detected by arm's aisd allocator TLA+ model ? Or a real bug report ? -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/ _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm