On 06/23/22 at 03:07pm, Catalin Marinas wrote: > On Wed, Jun 22, 2022 at 04:35:16PM +0800, Baoquan He wrote: > > On 06/21/22 at 07:04pm, Catalin Marinas wrote: > > > The problem with splitting is that you can end up with two entries in > > > the TLB for the same VA->PA mapping (e.g. one for a 4KB page and another > > > for a 2MB block). In the lucky case, the CPU will trigger a TLB conflict > > > abort (but can be worse like loss of coherency). > > > > Thanks for this explanation. Is this a drawback of arm64 design? X86 > > code do the same thing w/o issue, is there way to overcome this on > > arm64 from hardware or software side? > > It is a drawback of the arm64 implementations. Having multiple TLB > entries for the same VA would need additional logic in hardware to > detect, so the microarchitects have pushed back. In ARMv8.4, some > balanced was reached with FEAT_BBM so that the only visible side-effect > is a potential TLB conflict abort that could be resolved by software. I see, thx. > > > I ever got a arm64 server with huge memory, w or w/o crashkernel setting > > have different bootup time. And the more often TLB miss and flush will > > cause performance cost. It is really a pity if we have very powerful > > arm64 cpu and system capacity, but bottlenecked by this drawback. > > Is it only the boot time affected or the runtime performance as well? Sorry for late reply. What I observerd is the boot time serious latecy with huge memory. Since the timestamp is not available at that time, we can't tell the number. I didn't notice the runtime performance.