Local Value Numbering

January 07, 2026

I’ve been writing a JIT C compiler for the past 8 months and I thought it might be good to start documenting some of my experiments. I’m in no way suitable for talking about compilers or optimizations, so take the following with a large grain of salt.

Intermediate Representation

The IR I’ll be using in this post is a 3-address code (TAC) IR. I call it LIR in the compiler (Low-level IR) but it can be considered a middle-level IR. It includes high-level constructs to hide ABI-specific details (va_start/va_end/etc instructions and specific function argument operands), but it also supports complex addressing modes like x64 (i.e. [%vr0 + %vr1 * 8 + 16]). There are also a High-level IR (similar to LLVM) and a Machine-level IR (identical to x64 assembly with both hardware and virtual registers).

There are 8 kinds of operands in LIR:

Register
- Only virtual registers. There is no concept of hardware registers at this level
- There are 2 classes of registers: Integer and Floating point, each with their own types.
Constant (either integer or floating point)
Basic Block
- Used only by branch and phi instructions
Generic Memory Reference ([base_reg + index_reg * scale + displacement])
- Both base_reg and index_reg can be set to None
- Both base_reg and index_reg are expected to be integer registers
External Symbol
- RIP-relative memory references used to point to global variables or functions.
Stack Object
- Stack-relative memory references used for function-local variables.
Comparison Predicate
- Used only by comparison and conditional branch instructions.
Function Argument
- Used only by the copy instruction to move function arguments to a virtual register.

It’s a load/store IR, meaning that most instructions accept only registers as operands, some accept constants and only specific instructions accept memory operands.

call is a multi-operand instruction, breaking the strict 3-address code concept. It’s the only instruction with an optional (i.e. NULL) destination operand (all other instructions either always have a valid destination operand or don’t have any destination operand at all). All call arguments are always registers.

There is a phi instruction to support code in SSA form, which is mandatory for some transformation passes, but the IR API does not impose any restriction on that. In other words, a transformation pass is free to define a virtual register multiple times if it needs to. phi is also a multi-operand instruction, with a single destination operand and 1 pair of (register, basic block) operands for each of the basic block’s predecessors.

High-level algorithm

Assuming the code is in SSA form, the basic high-level algorithm for hash-based Local Value Numbering (LVN) is:

for each basic block BB {
    clear hash table T
    clear value number to register map

    for each instruction INSTR in BB {
        if INSTR is in the form DST = opcode SRC1, SRC2, ... {
            HASH = calculate instruction hash
            if HASH is in table T {
                VN = get value number of HASH from T
                map VN to DST
                mark INSTR as dead
            } else {
                VN = generate new value number
                insert (VN, HASH) to T
                map VN to DST
            }
        }
    }
}

The output of the above algorithm is a mapping between virtual registers for the whole function. This mapping is used to rewrite instructions in a second pass. Finally all dead instructions are removed from the function.

Calls

As mentioned above call is a special instruction from the IR’s perspective but it’s also a special instruction from the LVN’s perspective. Even though its form is DST = opcode SRC1, SRC2, ... as expected by the algorithm above, DST might be NULL but more importantly we don’t want to hash and eliminate calls, because this will change the behavior of the program.

E.g. multiple %vr_i = call myFunc %vr0 should not be replaced by the first call’s result because the algorithm does not know if myFunc has side-effects or touches global state. If the algorithm knew more about myFunc (e.g. leaf function without side effects) it might be possible to value number those calls and eliminate all but the first, but since that info is not (currently) available, call and call-like instructions are handled separately in the inner loop.

    for each instruction INSTR in BB {
        if isCall(INSTR) {
            if destination operand DST is not NULL {
                VN = generate new value number
                map VN to DST
            }
        } else if INSTR is in the form DST = opcode SRC1, SRC2, ... {
            // Same as before...
        }
    }

isCall checks if the instruction’s opcode is call-like. Currently it checks for call, va_start, va_copy and va_end. All those are treated as function calls.

Phis

The other special instruction is phi. phis always have a destination operand (register). There is no point in hashing phis because they (by definition) appear at the start of a basic block and as a result all used register operands would get a new VN either way. In other words, trivial/redundant phis will remain redundant and it’s another pass’ responsibility to remove those. As a result, the destination register always get a new VN.

    for each instruction INSTR in BB {
        if isCall(INSTR) {
            // Same as before...
        } else if INSTR is DST = phi [val1, BB1], [val2, BB2], ... {
            VN = generate new value number
            map VN to DST
        } else if INSTR is in the form DST = opcode SRC1, SRC2, ... {
            // Same as before...
        }
    }

Terminators

Terminator instructions (branchs and ret) are ignored by the algorithm. They are by definition not in the expected form to be hashed and there is no point to, somehow, value-numbering them.

Loads and stores

Instructions which load data from memory should be value numbered because theoretically there is no benefit in loading the same value from memory multiple times if it hasn’t changed. They are also in the correct form to be hashed. The issue with loads is stores. Hashing a load instruction based only on its operands will give wrong results if there is an intervening store between 2 identical loads.

%vr1 = load dword [%vr0 + 8];
store dword [%vr2], %vr1;
%vr3 = load dword [%vr0 + 8];
; Other arithmetic instructions and loads
%vr4 = load dword [%vr0 + 8];

Without alias analysis the algorithm cannot know if writing 4 bytes to [%vr2] overwrites the value loaded from [%vr0 + 8]. The algorithm needs a way to identify that the 2nd load is not redundant and it should assign %vr3 a different value number than %vr1. But it also needs to figure out that the 3rd load is redundant because there is no intervening instruction which affects the already loaded value into %vr3.

The easiest way to handle this is to introduce a new global value number for memory (mem_vn). Whenever the algorithm encounters an instruction which might affect the memory (store, call, etc), mem_vn is assigned a new value number. When hashing load instructions mem_vn is included in the hashed data and as a result a different hash is produced for each load, from the same memory location, if mem_vn changed at some point.

Memory objects

As mentioned in the beginning there are 3 kinds of memory-related operands. Generic memory references, which follow x64 addressing modes, stack objects and external symbols. Stack objects are disjoint, even though they actually end up sharing the same stack. The same is true for external symbols, even though they most probably end up sharing the same section in the final image.

Assuming the initial program code was correct (e.g. all accesses to a specific stack object are within its allocated space), we can take the value numbering of stack objects and external symbols one step further by assigning different value numbers to different segments based on how they are accessed. E.g.

%vr1 = load dword [stack_obj.0];
store dword [stack_obj.1 + 4], %vr1;
%vr2 = load dword [stack_obj.0];
%vr3 = load dword [stack_obj.1 + 4];

In this case, the 2nd load is redundant because the store to [stack_obj.1 + 4] cannot overwrite the value loaded into %vr1 from [stack_obj.0]. In other words, the store does not affect the value number assigned to the [0, 4) segment of stack_obj.0. Additionally, it might help if the 3rd load was eliminated by forwarding %vr1 (i.e. the last stored value number to [stack_obj.1 + 4]) to %vr3.

This is implemented by creating a memory object for each stack object and external symbol. This memory object is split into segments, based on how it’s accessed, and each segment keeps its own “memory VN” and its “last stored VN”. Initially all memory objects get a single continuous segment [0, UINT32_MAX], its memory VN gets a new value and last stored VN is set to an invalid value. Even though the code knows the true size of stack objects are this point, this might not be true for external symbols defined in separate translation units.

typedef struct mem_segment_t
{
    mem_segment_t* m_Next;
    mem_segment_t* m_Prev;
    uint32_t m_Offset;       // Segment's offset
    uint32_t m_Size;         // Segment's size
    uint32_t m_MemVN;        // mem_vn of the segment used when hashing loads
    uint32_t m_LastStoredVN; // VN of the last stored reg to this segment; might be invalid
} mem_segment_t;

typedef struct mem_object_t
{
    mem_segment_t* m_SegmentListHead;
    mem_segment_t* m_SegmentListTail;
} mem_object_t;

All memory objects are kept into a hashmap. Every time a store to a generic memory reference is encountered, the hashmap is cleared and all information about memory objects is discarded. Store-to-load forwarding is implemented in case of loads just before hashing the instruction. If the forwarding succeeds, hashing is skipped and the algorithm moves on to the next instruction. Otherwise, the load is hashed as a regular instruction, including all the VNs of the segments covering the accessed part.

Let’s see how each instruction from the above snippet is processed.

%vr1 = load dword [stack_obj.0]
- This is a load. Before hashing, check if there is a valid VN stored to dword [stack_obj.0].
- Stack object #0 has no segment at offset 0 with size 4 yet. It is split into 2 segments: [0, 4) + [4, UINT32_MAX].
- Both new segments get the same VNs as the parent segment (i.e. invalid last stored VN).
- Since the new segment’s last stored VN is invalid, there is nothing to do at this point.
- The load is hashed using segment’s [0, 4) VN.
- Nothing is found in the hashtable for this hash. A new VN is assigned to %vr1 and the load is inserted into the table.
store dword [stack_obj.1 + 4], %vr1
- This is a store. The source register’s VN (%vr1) should be stored to the corresponding memory segment.
- Stack object #1 is split into 3 segments: [0, 4) + [4, 8) + [8, UINT32_MAX].
- All three get the same value numbers as the initial segment.
- Segment’s [4, 8) last stored VN is set to %vr1’s VN and its VN gets a new value.
- Stack object #0 VNs are not affected.
- No hashing is performed because store does not have a destination register.
%vr2 = load dword [stack_obj.0]
- This is a load. Before hashing, check if there is a valid VN stored to dword [stack_obj.0].
- A segment [0, 4) for stack object #0 already exists so no splitting is necessary.
- Its last stored VN is invalid since we haven’t encountered a store to this segment yet.
- The load is hashed using the segment’s [0, 4) VN. Since the VN hasn’t changed, the hash ends up being the same as the one calculated for the first instruction and as a result %vr2 is mapped to %vr1 and the load is marked as dead.
%vr3 = load dword [stack_obj.1 + 4]
- This is a load. Before hashing, check if there is a valid VN stored to dword [stack_obj.1 + 4].
- A segment [4, 8) for stack object #1 already exists. Its last stored VN is equal to %vr1’s VN.
- As a result %vr3 is mapped to %vr1 and the load is marked as dead.
- No hashing is performed.

Stores and loads to/from external symbols are handled in the same way. Doing store-to-load forwarding this way for generic memory references works only if the global mem_vn hasn’t changed and only for the last generic memory store (i.e. keep track of the base/index/scale/displacement of the last generic store and if a load matches those, use the stored VN).

Commutative operations

Commutative operations (add, mul, fadd, etc) should be hashed using some predefined ordering of their operands. Currently, all arithmetic instructions support either (register, register) or (register, constant) operands. When both operands are registers they are first ordered based on their VNs and hashed in that order. This allows calculating the same hash for both add %vr1, %vr2 and add %vr2, %vr1.

Comparisons

Comparisons can be handled in the same way as the commutative operations. The only change required is to invert the comparison predicate operand. E.g. a signed less-equal comparison icmp sle, %vr2, %vr1 might be hashed as a signed greater-equal icmp sge, %vr1, %vr2 due to the order of register VNs.

The end

The complete algorithm is shown below.

for each basic block BB {
    clear hash table T
    clear value number to register map

    for each instruction INSTR in BB {
        if isCall(INSTR) {
            if destination operand DST is not NULL {
                VN = generate new value number
                map VN to DST
            }
        } else if INSTR is DST = phi [val1, BB1], [val2, BB2], ... {
            VN = generate new value number
            map VN to DST
        } else if INSTR is a store [MEM], SRC {
            VN = get value number of SRC
            assign VN to the [MEM]'s last stored VN
        } else if INSTR is in the form DST = opcode SRC1, SRC2, ... {
            if INSTR is a DST = load [MEM] {
                VN = get [MEM]'s last stored VN
                if VN is valid {
                    map VN to DST
                    mark INSTR as dead
                    continue;
                }
            }

            HASH = calculate instruction hash
            if HASH is in table T {
                VN = get value number of HASH from T
                map VN to DST
                mark INSTR as dead
            } else {
                VN = generate new value number
                insert (VN, HASH) to T
                map VN to DST
            }
        }
    }
}

That’s it. Unless I forgot some major detail, that’s how LVN is currently implemented in my SSA TAC IR. Thanks for reading.