ARUZ: FPGA Computing at Extraordinary Scale

ARUZ is one of the defining projects in Brightelligence’s engineering history. It represents the kind of FPGA work that exists far beyond ordinary board design, RTL implementation, or subsystem integration. ARUZ was conceived as a massively parallel, low-latency FPGA computing machine built to solve classes of scientific problems that do not fit well into conventional CPU-centric systems. At its full scale, the platform was built around approximately 26,000 interconnected FPGA devices, making it the most ambitious single-system FPGA machine ever realized.

What makes ARUZ important is not just its size, but what that size demanded from the engineering process. Projects of this kind cannot be approached as “larger versions” of normal FPGA development. Once a system grows to tens of thousands of programmable devices, the core challenges change completely. Architecture, communication topology, synchronization, configuration strategy, observability, failure handling, serviceability, and repeatable verification all become central design problems. ARUZ is therefore a strong example of the kind of deep systems engineering capability Brightelligence brings to advanced FPGA work.

A Machine Built at a Different Order of Magnitude

ARUZ has been designed as a single, coherent computing installation rather than a loose collection of accelerator boards. The system is composed of 25,920 programmable devices in total, including 23,040 Artix XC7A200T FPGAs used for computation and 2,880 Zynq XC7Z015 devices used for management and control. The platform is organized into 20 panels, each containing 12 rows of 12 daughter boards, for a total of 2,880 DBoards across the machine. Each DBoard carries nine FPGAs: eight computational devices and one management device. In day-to-day operation, 18 panels are typically used for active simulation while two remain redundant, ready to take over in the event of technical issues.

The physical scale matches the architectural scale. ARUZ weighs roughly 50 tons, standing around 4.5 meters high, spanning about 14 meters in diameter, and relies on tens of thousands of signal connections with total cable length measured in many tens of kilometers. It is housed in a shielded environment, supported by water cooling, and operates with power consumption measured in the range of around 100 kilowatts. These are not the characteristics of a typical FPGA prototype. They are the characteristics of a full-scale computing machine built to execute highly specialized workloads with exceptional parallel efficiency.

Why ARUZ Came to Life

ARUZ was constructed for simulations built around huge numbers of relatively simple, locally interacting elements. That is an important distinction. Many workloads do not benefit most from generic instruction-based computation. Instead, they benefit from deterministic communication, very low latency, and the ability to update a vast number of states in parallel. ARUZ was designed precisely for that space. The architecture was inspired by the Dynamic Lattice Liquid model, and the platform was positioned as a fully parallel data-processing system for large-scale simulations of complex physical systems. Later work also demonstrated implementations beyond the original target use case, including Lattice Boltzmann and diffusion-related models.

This is one of the reasons ARUZ matters so much as a Brightelligence case study. It shows experience not only in FPGA implementation, but in application-driven architecture. The machine was not built around a fashionable technology choice or a generic compute narrative. It was built because the problem class demanded a custom, deeply parallel, highly flexible, communication-aware hardware architecture. That is exactly the level on which the most advanced FPGA projects need to be approached: not from “how do we code this,” but from “what kind of machine should exist to solve this efficiently.”

Architecture Designed Around Local Communication

A key feature of ARUZ is its communication structure. The system separates global communication, control communication, and local communication, because different classes of traffic have different requirements and priorities. Global communication handles configuration and result collection — standard Ethernet is used. Control communication handles synchronization, which required speed — therefore, a proprietary, asynchronous communication protocol has been developed. Local communication is the performance-critical path, optimized for the rapid exchange of neighboring-state data between compute elements. It realizes a purpose-engineered protocol oriented toward low latency and reliability: direct, dedicated links engineered for low latency, including LVDS-based communication, source-synchronous clocking, CRC protection, and retransmission on error. This architecture was essential because the value of the machine depended not only on how many FPGAs it contained, but on how efficiently those FPGAs could behave as one computational fabric.

That architectural thinking reflects a broader Brightelligence strength: understanding that in advanced FPGA systems, interconnect is often as important as computation. The most difficult performance bottlenecks frequently arise not inside a single processing element, but at the boundaries between them. ARUZ is a strong example of designing around that reality from the start. It combines compute density with communication awareness, allowing the overall machine to support massively parallel simulation patterns that would otherwise be limited by latency and coordination overhead.

Gateware Development at Massive Scale

At ARUZ scale, gateware development could not rely on traditional manual HDL workflows alone. One of the most valuable outcomes of the project was therefore methodological, not only architectural. The published ARUZ gateware methodology shows how generic, parameterized VHDL descriptions of processing elements were combined with preprocessing and dedicated software tooling to generate target-specific HDL automatically. This approach enabled automatic generation of up to 81% of the code, while still preserving low-level optimization and project-specific specialization.

That methodology mattered because without it, iteration would have become painfully slow and operationally expensive. The synthesis runtime for needed simulation variants was reduced from months to days, while configuration of all DSlaves in the machine took only around two minutes. In other words, ARUZ is not just a very large FPGA installation. It is a very large FPGA installation made practical through disciplined engineering automation, code generation, scalable configuration strategy, and a development process designed for reuse and repeatability.

Verification, Bring-Up, and Debugging Beyond Normal FPGA Practice

Verification of a platform like ARUZ had to operate at a completely different level from standard project flows. The documented methodology includes emulator-based reference model checking, HDL simulation, reduced-scale hardware verification on a smaller “nanoARUZ” platform, automatic configuration testing, and dedicated in-system debug mechanisms. The project also used custom logic-analysis infrastructure so that internal behavior could be observed across many chips simultaneously. At this scale, debugging cannot depend on ad hoc probing or isolated board-level checks. It requires built-in observability, structured test flows, and automated comparison between expected and measured results.

This is one of the clearest reasons ARUZ is such a strong proof point for Brightelligence. It demonstrates experience in the full reality of advanced FPGA engineering: not only RTL design, but also machine-scale verification strategy, configuration management, system bring-up, diagnostics, and operational stability. Those are exactly the capabilities that matter most in demanding commercial programmes, especially where FPGA systems become central to product performance and system behavior.

Scientific Output and Measurable Results

ARUZ also stands out because it produced substantial scientific and technical output. The machine and its methodology were documented in peer-reviewed publications, including the 2018 Computer Physics Communications paper describing the overall architecture and the 2020 Electronics paper describing the gateware-development methodology. Additional conference and journal publications covered application implementations, including Lattice Boltzmann, molecular diffusion, and reaction modeling. Project materials also list multiple patents connected with the broader technology.

The published results show why this mattered. In one Lattice Boltzmann implementation, the optimized approach achieved a 46% performance improvement over earlier work and projected 302 × 10^3 MLUPS on 18 ARUZ panels, with significantly better power efficiency than the referenced classical supercomputer comparison. Other project materials highlight similarly dramatic gains for suitable workloads, where simulation times that would be impractical on conventional systems became feasible on ARUZ. This combination of architectural originality, implementation depth, and published performance evidence is what elevates the project from “large FPGA system” to genuine world-class engineering achievement.

What ARUZ Says About Brightelligence

ARUZ is more than an impressive historical project. It is a clear signal of the level at which Brightelligence operates. It shows experience with large-scale FPGA architecture, communication-centric design, custom compute fabrics, embedded control, gateware methodology, automation, verification, debugging, and scientific-grade technical rigor. It also shows that Brightelligence has worked on FPGA systems where success depends on solving problems that most teams never encounter at all.

For clients, that matters. Companies looking for an FPGA partner are often not looking merely for extra implementation capacity. They are looking for confidence: confidence that the partner understands architecture, scale, integration, performance, verification, and the hidden risks inside technically ambitious systems. ARUZ is one of the strongest examples in the Brightelligence portfolio because it demonstrates exactly that. It reflects the ability to contribute to FPGA projects at a level where the engineering challenge is not just writing logic, but designing and delivering machines that push the practical limits of what FPGA technology can do.

For more details, check the publication section.

Share