AI inference processing has a classic bottleneck: moving data from memory to the processor. Untether AI introduces Boqueria, an accelerator that breaks this dynamic. Its massively parallel architecture operates at-memory, meaning right where data is stored, reducing energy consumption and increasing performance per watt. It's not magic, it's well-thought-out engineering.
How Boqueria's at-memory architecture works 🚀
Boqueria integrates thousands of compute cores directly into SRAM memory, eliminating the need to move data across external buses. Each core executes simple operations but in parallel, allowing neural network models to be processed with high efficiency. By minimizing latency and the energy cost of data movement, this chip achieves sustained performance in inference tasks without relying on expensive HBM memory or extreme cooling.
The smart cousin who doesn't need to move to work 🏠
While other accelerators put on a logistical circus to bring data closer to the processor, Boqueria is that colleague who works from home. Literally, it processes information where it lives. So if your GPU sounds like a noisy, hot vacuum cleaner, maybe you should consider a change. After all, you don't need to travel to the other side of the chip to do the math.