On 2022/6/27 10:52, Baoquan He wrote: > On 06/23/22 at 03:07pm, Catalin Marinas wrote: >> On Wed, Jun 22, 2022 at 04:35:16PM +0800, Baoquan He wrote: >>> On 06/21/22 at 07:04pm, Catalin Marinas wrote: >>>> The problem with splitting is that you can end up with two entries in >>>> the TLB for the same VA->PA mapping (e.g. one for a 4KB page and another >>>> for a 2MB block). In the lucky case, the CPU will trigger a TLB conflict >>>> abort (but can be worse like loss of coherency). >>> >>> Thanks for this explanation. Is this a drawback of arm64 design? X86 >>> code do the same thing w/o issue, is there way to overcome this on >>> arm64 from hardware or software side? >> >> It is a drawback of the arm64 implementations. Having multiple TLB >> entries for the same VA would need additional logic in hardware to >> detect, so the microarchitects have pushed back. In ARMv8.4, some >> balanced was reached with FEAT_BBM so that the only visible side-effect >> is a potential TLB conflict abort that could be resolved by software. > > I see, thx. > >> >>> I ever got a arm64 server with huge memory, w or w/o crashkernel setting >>> have different bootup time. And the more often TLB miss and flush will >>> cause performance cost. It is really a pity if we have very powerful >>> arm64 cpu and system capacity, but bottlenecked by this drawback. >> >> Is it only the boot time affected or the runtime performance as well? > > Sorry for late reply. What I observerd is the boot time serious latecy > with huge memory. Since the timestamp is not available at that time, > we can't tell the number. I didn't notice the runtime performance. There's some data here, and I see you're not on the cc list. https://lore.kernel.org/linux-mm/1656241815-28494-1-git-send-email-guanghuifeng@xxxxxxxxxxxxxxxxx/T/ > > . > -- Regards, Zhen Lei