One feature of page tables is that each page table entry has an
associated Global 'G' flag set on it. when this is done, those entries
are not flushed even when the CR3 is loaded. I think the kernel can be
put up using those flags. Also the kernel code is placed in two 4MB
pages on x86 which supports PSE (Page Size Extensions). This reduces
TLB misses of kernel code as well as reduces contention for 4K page
TLB (since 4MB has a separate TLB).
IIRC on loading the CR3 there has to be a flush on the TLBs. or is it that the cachlelines have this G bit to recognise the page not to be flushed ?