On Wed, Jun 19, 2019 at 8:39 PM Will Deacon <will.deacon@xxxxxxx> wrote: > > On Wed, Jun 19, 2019 at 08:18:04PM +0800, Guo Ren wrote: > > On Wed, Jun 19, 2019 at 5:12 PM Will Deacon <will.deacon@xxxxxxx> wrote: > > > This is one place where I'd actually prefer not to go down the route of > > > making the code generic. Context-switching and low-level TLB management > > > is deeply architecture-specific and I worry that by trying to make this > > > code common, we run the real risk of introducing subtle bugs on some > > > architecture every time it is changed. > > "Add generic asid code" and "move arm's into generic" are two things. > > We could do > > first and let architecture's maintainer to choose. > > If I understand the proposal being discussed, it involves basing that > generic ASID allocation code around the arm64 implementation which I don't > necessarily think is a good starting point. ... > > > > Furthermore, the algorithm we use > > > on arm64 is designed to scale to large systems using DVM and may well be > > > too complex and/or sub-optimal for architectures with different system > > > topologies or TLB invalidation mechanisms. > > It's just a asid algorithm not very complex and there is a callback > > for architecture to define their > > own local hart tlb flush. Seems it has nothing with DVM or tlb > > broadcast mechanism. > > I'm pleased that you think the algorithm is not very complex, but I'm also > worried that you might not have fully understood some of its finer details. I understand your concern about my less understanding of asid technology. Here is my short-description of arm64 asid allocator: (If you find anything wrong, please correct me directly, thx :) The asid allocator use five level check to reduce the cost of switch_mm. 1. Check if the asid version is the same (it's general) 2. Check reserved_asid which is set in rollover flush_context() and the key point is keep the same bit position with the current asid version instead of input version. 3. Check if the position of bitmap is free then it could be set & used directly. 4. find_next_zero_bit() (a little performance cost) 5. flush_context (this is the worst cost with increase current asid version) Check is level by level and cost is also higher with the next level. The design that impressed me the most was reserved_asid and bitmap and the 2th level and 3th level will prevent unnecessary find_next_zero_bit(). The atomic 64 bit asid is also ok for 32-bit system and it won't cost a lot in 1th 2th 3th level check. The operation of set/clear mm_cpumask was removed in arm64 compared to arm32. It seems no side effect on current arm64 system, but from software meaning it's wrong. So I think it should be reserved in generic version. > > The reason I mention DVM and TLB broadcasting is because, depending on > the mechanisms in your architecture relating to those, it may be strictly > required that all concurrently running threads of a process have the same > ASID at any given point in time, or it may be that you really don't care. > > If you don't care, then the arm64 allocator is over-engineered and likely > inefficient for your system. If you do care, then it's worth considering > whether a lock is sufficient around the allocator if you don't expect high > core counts. Another possibility is that you end up using only one ASID and > invalidating the local TLB on every context switch. Yet another design > would be to manage per-cpu ASID pools. I'll keep my system use the same ASID for SMP + IOMMU :P Yes, there are two styles of asid allocator: per-cpu ASID (MIPS) or same ASID (ARM). If the CPU couldn't support cache/tlb coherency maintian in hardware, it should use per-cpu ASID style because IPI is expensive and per-cpu ASID style need more software mechanism to improve performance (eg: delay cache flush). From software view the same ASID is clearer and easier to build bigger system with more TLB caches. I think the same ASID style is a more sensible choice for modern processor and let it be one of generic is reasonable. > > So rather than blindly copying the arm64 code, I suggest sitting down and > designing something that fits to your architecture instead. You may end up > with something that is both simpler and more efficient. In fact, riscv folks have discussed a lot about arm's asid allocator and I learned a lot from the discussion: https://lore.kernel.org/linux-riscv/20190327100201.32220-1-anup.patel@xxxxxxx/ We are developing C-SKY and RISC-V ISA cpu cores and make it generic is good for us. Best Regards Guo Ren _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm