On Thu, Feb 17, 2022 at 7:48 PM Jakob Koschel <jakobkoschel@xxxxxxxxx> wrote: > list_for_each_entry() selects either the correct value (pos) or a safe > value for the additional mispredicted iteration (NULL) for the list > iterator. > list_for_each_entry() calls select_nospec(), which performs > a branch-less select. > > On x86, this select is performed via a cmov. Otherwise, it's performed > via various shift/mask/etc. operations. [...] > #define list_for_each_entry(pos, head, member) \ > for (pos = list_first_entry(head, typeof(*pos), member); \ > - !list_entry_is_head(pos, head, member); \ > + ({ bool _cond = !list_entry_is_head(pos, head, member); \ > + pos = select_nospec(_cond, pos, NULL); _cond; }); \ > pos = list_next_entry(pos, member)) Actually I do have one ugly question about this: Is NULL a good/safe choice here? We already know that CPUs try very aggressively to do store-to-load forwarding. Until now, my mental model of store-to-load forwarding was basically: "The CPU has to guess whether the linear addresses will be the same, and once it knows the linear addresses, it can verify whether that guess was correct." But of course that can't really be the whole mechanism, because many architectures guarantee that if you access the same physical page through multiple linear addresses, everything stays coherent. So I'm wondering: Are there basically two stages of speculation based on address guesses? A first stage where the CPU guesses whether the linear addresses are the same, and a second stage where it assumes that different linear addresses also map to different physical addresses, or something like that? And so, if we don't have a TLB entry for NULL, and we misspeculate through a speculative write to an object of type A at NULL and then a speculative read (at the same offset) from an object of type B at NULL, will we get speculative type confusion through the nonexistent object at NULL that lasts until either the branches are resolved or the page walk for NULL reports back that there is no page at NULL? (Also, it's been known for a long time that speculative accesses to NULL can be a performance problem, too: https://lwn.net/Articles/444336/) So I'm wondering whether, on 64-bit architectures that have canonical address bits, it would be safer and also reduce the amount of useless pagetable walks to try to butcher up the canonical bits of the address somehow so that the CPU can quickly see that the access is bogus, without potentially having to do a pagetable walk first.