





## **Open-Source Hardware** in the Post Moore Era

## Dr. George Michelogiannakis

Research scientist Computer architecture group Lawrence Berkeley National Laboratory

These are not DOE's or LBNL's official views



## **Technology Scaling Trends**













.....

BERKELEY LAB



Peter Bright "Intel retires "tick-tock" development model, extending the life of each process ", 2016





# Performance

## Now – 2025

Moore's Law continues through ~5nm -- beyond which diminishing returns are expected.

2016

2016-2025

End of Moore's Law 2025-2030?

## Post Moore Scaling

New materials and devices introduced to enable continued scaling of electronics performance and efficiency.

Performance

2025+



## **Some Paths Forward in Post Moore**









# **Enabled by Emerging Nanotechnologies**

## Massive Sensing



Shulaker "Transforming Emerging Technologies into Working Systems"





## Lets Get The Most Out of CMOS Before we Jump Ship



## General-Purpose Architectures Trade Overhead for Programmability





"Accelerator-Rich Architectures: Opportunities and Progresses", DAC 2014



## Architectures Trade Overhead for Programmability





"Accelerator-Rich Architectures: Opportunities and Progresses", DAC 2014



### Accelerators Have Been Growing in HPC



#### Top 500 systems



Strohmaier "Top 500", SC17





- \* Hardware that is more suited for specific kinds of computation
  - Can also have accelerators for data transfer















Wadler, Intel's Response to ARM Servers: Xeon D Processors with a Twist, 2016

- FPGA accelerators used as programmable array of soft cores – more like a GPU model
- Parallels early days of GPGPU computing
  - Capable hardware
  - New languages raising abstraction levels
  - Tools lacking



Broadwell + Arria 10 GX MCP





- \* How fine-grain accelerators?
- How to schedule and transfer data?



Yakun S et al "Aladdin"



## Hardware Development Effort Is a Challenge





- Typically used to prune design space
- No substitute for real hardware
- \* Lets make hardware development faster!
  - High level synthesislanguages
  - Open-source hardware





## 12-18 month cycle

Zipcpu, "FPGAs vs ASICs", 2017





## Reduce Hardware Development Effort to Explore the Specialization Spectrum With Open-Source Hardware





#### The Rise of Open Source Software: Will Hardware Follow Suit?



- Rapid growth in the adoption and number of open source software projects
- More than 95% of web servers run Linux variants, approximately 85% of smartphones run Android variants
- Will open source hardware ignite the semiconductor industry?





## \* A complete set of tools





## **OpenSoC System Architect**









- Shockingly but accidentally similar to Sunway node architecture
- 4 Z-Scale processors connected on a 4x4 mesh and Micron HMC memory
- Two people spent two months to create







## **Use Open-Source Hardware: Specialization Opportunities**



## **A Specialization Opportunity**



- \* On-detector processing 50 010 40 30 Future detectors have data rates 30 20 20 exceeding 1 Tb/s 10 Proposed solution: 0 Process data before it leaves 2010 the sensor Application-tailored, programmable processing
  - Programmability allows processing to be tailored to the experiment







| 7 Giants of Data (NRC) | 7 Motifs of Simulation |
|------------------------|------------------------|
| Basic statistics       | Monte Carlo methods    |
| Generalized N-Body     | Particle methods       |
| Graph-theory           | Unstructured meshes    |
| Linear algebra         | Dense Linear Algebra   |
| Optimizations          | Sparse Linear Algebra  |
| Integrations           | Spectral methods       |
| Alignment              | Structured Meshes      |





## \* Architecture to match data set shape to help communication

PDEcell / PICcell: Ultra-simple compute engine (50k gates) calculates finitedifference updates, and particle forces from neighbors. Microinstructions specify the PDE equation, stencil, and PIC operators. *Novel features:* variable length streaming integer arithmetic and novel PIC particle virtualization scheme.

#### **Computational Lattice:**

PDECells are tiles in a lattice/array on each 2D planar chip layer. Target 120x120 tiles per mm<sup>2</sup> @28nm lithography. Novel Features: each tile represents single cell of computational domain (pushes to limit of strong-scaling). Monolithic 3D Integration: Integrate layers of compute elements using emerging monolithic 3D chip stacking.

**Novel Features:** 1000 layer stacking (20x more than current practice). Area efficient inter-layer connectivity and new energy efficient transistor logic (ncFET).

1 Petaflop equivalent performance in 300mm^2 for < 200Watts.









## \* Quantum Computer = Quantum PU + Control Hardware



1000 qubits, gate time 10ns, 3 ops/qubit **300 billion ops per second** 





- \* Open-source projects rely on community
  - Need a collection of accelerators
- \* Open-source hardware may be the key to ubiquitous specialization
- Programmability and compilers must not be neglected
- \* It is an exciting time to be an architect







