~50× less silicon - Published Softmax SOTA.
~858× less silicon - Textbook Baseline.
One ~100-cell standard-cell block replaces a hardware softmax block. Independently re-validated across 3 open-source PDKs (Sky130, NanGate45, IHP130). At Sky130 8-bit: ~50× smaller than published academic SOTA softmax HW (ConSmax DAC 2024-class shared-exp implementations); ~858× smaller than the textbook N-exp + N-divide baseline taught in introductory hardware courses. Both numbers are real arithmetic; the modern-baseline number is the more conservative comparison for state-of-the-art chip vendors. Why we report both: marketing has historically led with the textbook number. We disclose both so customers can pick the comparison that matches their internal baseline.
Energy: ~10,000× less energy per solution on Stanford G-set MaxCut vs reference classical annealer (independently confirmed at adversarial settings).
Quantum: ~10¹⁰–10¹²× less full-system energy than IBM Quantum Heron-R2 hardware at K≥30 (cryostat-included — the only defensible quantum framing).
Same job. One block. Two honest ratios.
Same physical block whether the workload is softmax, GELU, layer-norm, RMS-norm, rotary embedding, sigmoid, tanh, or SiLU. Numbers below are independently re-synthesized cell counts (Yosys + Sky130 + open PDKs).
Combinatorial Optimization. ~10,000× less energy on MaxCut.
Same physical block. New workload. Stanford G-set MaxCut benchmark vs reference classical annealer (dwave-neal) and real IBM Quantum hardware. Independently re-validated at adversarial CPU settings.
- Benchmark: Stanford G-set MaxCut, instance G1 (800 nodes, published best-known cut = 11,624)
- Cut quality: chip and dwave-neal both hit 100% of best-known cut at iso-quality compute (independently re-verified - — chip ties dwave-neal on quality)
- Chip energy per solution: hundreds of nanojoules to hundreds of microjoules depending on activity assumptions (A6 conservative: ~776 nJ at 30% activity, 5 fJ/gate; A7 adversarial-stacked: ~446 µJ at 50% activity, 10 fJ/gate)
- vs dwave-neal CPU baseline: 1.5–5 J per solution depending on CPU TDP + dwave config (A7 measured 4.84 J at 15 W TDP, exact BKS) → ~10,000× less energy at adversarial-on-both-sides settings (A7 STRONG CONFIRM: 10,839× at 15 W TDP, 14,080× at median CPU time)
- vs real IBM Quantum Heron-R2 (NISQ-era, K≥30) full-system energy including cryostat: ~10¹⁰–10¹²× less. The full-system framing (which includes the 25 kW dilution refrigerator) is the only quantum comparison that survives adversarial probing; "marginal-per-shot" framings are fragile and we no longer cite them.
- Honest framing: chip ties dwave-neal on quality at iso-quality compute. We don't have a better algorithm — we have radically more efficient hardware for the same algorithm.
Cut quality: 100% of BKS · A7 measured 4.84 J at 15 W
Cut quality: 100% of BKS · ties dwave-neal at iso-quality compute (A6 confirmed)
Independent of and additive to the silicon-area advantage above (~50× vs modern softmax baselines / ~858× vs textbook).
Methodology: Sky130 standard-cell counts (Yosys + open PDK) × sky130_fd_sc_hd power model. CPU baseline: dwave-neal measured directly on commodity laptop. — Chip energy advantage holds at order-of-magnitude under adversarial assumption stacks.
The same chip computes activation functions not yet invented.
A small block of on-die configuration at boot adapts the block to any nonlinear operation a future architect specifies. No new mask set. No new tape-out. No 18-month chip cycle. Future-proof transformer hardware that adapts in firmware.
14 USPTO Provisional Patents
Filed across the architecture, programmability, cross-foundry portability, and firmware portability.
Let's build something.
Licensing, partnerships, due diligence, technical questions — all welcome. Verilog source files available under NDA after mutual due diligence.
ryan@quantxverse.com