Jiong Wang writes: > Yonghong Song writes: > >> On 10/12/19 12:18 AM, Jiong Wang wrote: >>> Currently, BPF backend is doing truncation elimination. If one truncation >>> is performed on a value defined by narrow loads, then it could be redundant >>> given BPF loads zero extend the destination register implicitly. >>> >>> When the definition of the truncated value is a merging value (PHI node) >>> that could come from different code paths, then checks need to be done on >>> all possible code paths. >>> >>> Above described optimization was introduced as r306685, however it doesn't >>> work when there is back-edge, for example when loop is used inside BPF >>> code. >>> >>> For example for the following code, a zero-extended value should be stored >>> into b[i], but the "and reg, 0xffff" is wrongly eliminated which then >>> generates corrupted data. >>> >>> void cal1(unsigned short *a, unsigned long *b, unsigned int k) >>> { >>> unsigned short e; >>> >>> e = *a; >>> for (unsigned int i = 0; i < k; i++) { >>> b[i] = e; >>> e = ~e; >>> } >>> } >>> >>> The reason is r306685 was trying to do the PHI node checks inside isel >>> DAG2DAG phase, and the checks are done on MachineInstr. This is actually >>> wrong, because MachineInstr is being built during isel phase and the >>> associated information is not completed yet. A quick search shows none >>> target other than BPF is access MachineInstr info during isel phase. >>> >>> For an PHI node, when you reached it during isel phase, it may have all >>> predecessors linked, but not successors. It seems successors are linked to >>> PHI node only when doing SelectionDAGISel::FinishBasicBlock and this >>> happens later than PreprocessISelDAG hook. >>> >>> Previously, BPF program doesn't allow loop, there is probably the reason >>> why this bug was not exposed. >>> >>> This patch therefore fixes the bug by the following approach: >>> - The existing truncation elimination code and the associated >>> "load_to_vreg_" records are removed. >>> - Instead, implement truncation elimination using MachineSSA pass, this >>> is where all information are built, and keep the pass together with other >>> similar peephole optimizations inside BPFMIPeephole.cpp. Redundant move >>> elimination logic is updated accordingly. >>> - Unit testcase included + no compilation errors for kernel BPF selftest. >> >> Thanks for the fix. The code looks good. Just two minor comments. > > Thanks Yonghong. Your comments make sense to me, will fix them. > >> After the fix, could you directly push to the llvm repo? > > Sure will do. > > (And I will update my llvm account email first, should be quick, if it takes > too long will come back to you for committing help) Fix pushed after two minor comments addressed and re-unit-tested: https://github.com/llvm/llvm-project/commit/ec51851026a55e1cfc7f006f0e75f0a19acb32d3 Regards, Jiong