Minimize the cycle count. The baseline runs in ~147,000 cycles. Top solutions achieve <1,500 cycles.
A VLIW SIMD processor that can execute multiple operations per cycle:
The baseline uses ~1 slot per cycle. Pack independent operations together!
// ALU kb.add("alu", ["+", dest, a, b]) kb.add("alu", ["^", dest, a, b]) kb.add("alu", ["%", dest, a, b]) // Load kb.add("load", ["const", dest, value]) kb.add("load", ["load", dest, addr]) kb.add("load", ["vload", dest, addr]) // Store kb.add("store", ["store", addr, src]) // Flow kb.add("flow", ["select", d, cond, a, b])
Instead of kb.add() one at a time, build instruction bundles:
kb.add()
// One cycle, multiple ops: kb.instrs.push({ "alu": [ ["+", a, b, c], ["*", d, e, f] ], "load": [ ["load", g, h] ] });
Process 8 items at once with VLEN=8:
// Vector add (8 elements) kb.add("valu", ["+", vDest, vA, vB]) // Load 8 consecutive values kb.add("load", ["vload", vDest, addr])