On 10/16/19 8:33 AM, Jiong Wang wrote: > > Jiong Wang writes: > >> Yonghong Song writes: >> >>> On 10/12/19 12:18 AM, Jiong Wang wrote: >>>> Currently, BPF backend is doing truncation elimination. If one truncation >>>> is performed on a value defined by narrow loads, then it could be redundant >>>> given BPF loads zero extend the destination register implicitly. >>>> >>>> When the definition of the truncated value is a merging value (PHI node) >>>> that could come from different code paths, then checks need to be done on >>>> all possible code paths. >>>> >>>> Above described optimization was introduced as r306685, however it doesn't >>>> work when there is back-edge, for example when loop is used inside BPF >>>> code. >>>> >>>> For example for the following code, a zero-extended value should be stored >>>> into b[i], but the "and reg, 0xffff" is wrongly eliminated which then >>>> generates corrupted data. >>>> >>>> void cal1(unsigned short *a, unsigned long *b, unsigned int k) >>>> { >>>> unsigned short e; >>>> >>>> e = *a; >>>> for (unsigned int i = 0; i < k; i++) { >>>> b[i] = e; >>>> e = ~e; >>>> } >>>> } >>>> >>>> The reason is r306685 was trying to do the PHI node checks inside isel >>>> DAG2DAG phase, and the checks are done on MachineInstr. This is actually >>>> wrong, because MachineInstr is being built during isel phase and the >>>> associated information is not completed yet. A quick search shows none >>>> target other than BPF is access MachineInstr info during isel phase. >>>> >>>> For an PHI node, when you reached it during isel phase, it may have all >>>> predecessors linked, but not successors. It seems successors are linked to >>>> PHI node only when doing SelectionDAGISel::FinishBasicBlock and this >>>> happens later than PreprocessISelDAG hook. >>>> >>>> Previously, BPF program doesn't allow loop, there is probably the reason >>>> why this bug was not exposed. >>>> >>>> This patch therefore fixes the bug by the following approach: >>>> - The existing truncation elimination code and the associated >>>> "load_to_vreg_" records are removed. >>>> - Instead, implement truncation elimination using MachineSSA pass, this >>>> is where all information are built, and keep the pass together with other >>>> similar peephole optimizations inside BPFMIPeephole.cpp. Redundant move >>>> elimination logic is updated accordingly. >>>> - Unit testcase included + no compilation errors for kernel BPF selftest. >>> >>> Thanks for the fix. The code looks good. Just two minor comments. >> >> Thanks Yonghong. Your comments make sense to me, will fix them. >> >>> After the fix, could you directly push to the llvm repo? >> >> Sure will do. >> >> (And I will update my llvm account email first, should be quick, if it takes >> too long will come back to you for committing help) > > Fix pushed after two minor comments addressed and re-unit-tested: > > https://github.com/llvm/llvm-project/commit/ec51851026a55e1cfc7f006f0e75f0a19acb32d3 Thanks!