University of Doha for Science & Technology, Doha, Qatar | December 03-05, 2024

Computation Accelerators for AI / ML and IoT: Domain Specific Architectures


Shaftab Ahmed and M. Yasin Akhtar Raja
Center for Optoelectronics & Optical Communications,
UNC Charlotte, NC, USA
Email: raja (at)

M. Ilyas
EE&CS Dept., College of Engineering & Computer Science,
Florida Atlantic University,
Boca Raton, FL
Email: ilyas (at)

Homogeneous multicore architectures integrate multiple identical cores onto the same die to provide higher computational capabilities under similar area budgets [1]. That opened new avenues to parallel processing capabilities for high performance with a modest power-consumption increase, thereby allowing drastic energy efficiency. However, homogeneous cores cannot simultaneously satisfy competing application requirements, such as low power and high performance.

Quantitative Comparison CPU, GPU, FPGA, and ASIC:

CPU implementations require the least design effort but provide low energy efficiency. GPUs and FPGAs improve energy efficiency and performance by exploiting single instruction multiple data (SIMD) execution and parallelism benefits, respectively. Application code is converted to GPU-compatible code to run on GPUs, and hardware description languages or high-level synthesis for FPGAs. ASICs provide the highest energy efficiency since those are specifically designed for the target application. However, the ASIC effort, which includes design, development, fabrication, and software development, could require several months to years.

Therefore, there is a critical need to continue the evolution of computing architectures to provide ASIC-like energy efficiency with the shortest possible time-to-market.

Domain-specific architectures (DSAs) represent an emerging instance of heterogeneous architectures that optimize data flow for applications in a target domain through hardware acceleration while providing programming flexibility. The inclusion of Chiplets in these designs has allowed designers to overcome some of the difficulties and come up with innovative designs.

Recently growing domains are machine learning (ML) and artificial intelligence (AI). For instance, ML and AI are extensively being used for image processing, scheduling, recommendation systems, spam filtering, stock market analysis, and medical applications [2]. There is a strong need for computing architectures that enable seamless, high-performance, and energy-efficient execution for these domain applications.

DSAs aim at improved programmability by including general purpose cores and the highest energy efficiency by integrating special-purpose processors and hardware accelerators [3]. The domain-specific nature of DSAs stems from the fact that the hardware accelerators and data flows are highly tailored to the type of computations in the applications of a particular domain. Broadly speaking, DSAs encompass any computing architecture that provides the following:

Superior energy efficiency through specialized processing: The specialized processors accelerate the frequently occurring domain-specific computations in hardware, thereby boosting energy efficiency. For example, a custom-designed fast Fourier transform (FFT) hardware accelerates the direct- and inverse-FFT operations, whereas a systolic matrix multiplication processor accelerates ML and AI applications.


DSAs aim to improve programming flexibility for both domain and non-domain applications. For example, DSAs that target neural network inference must be programmable to execute multilayer perceptron’s, convolutional neural networks, and recurrent neural networks. In addition, they must be capable of executing other neural network inference operations that cannot be easily implemented using specialized hardware. Finally, they should be able to execute non-domain applications to improve flexibility and enable broader usage such as in the application involving IoT (internet of things).

Heterogeneous Processing Elements: The diverse types of processing elements (PEs) in DSAs cater to contrasting application requirements such as low power, high performance, energy efficiency, and programmability.

Papers Submissions:

Papers that present original work, validated by experimentation, simulation, or analysis, testbeds, field-trials, or real deployments are welcome. All papers should be submitted at the HONET main website using the link below.
Paper Length: Full papers can be up to 6 pages, and short papers up to 3 pages. The page length includes the bibliography and well-marks appendices.
Format: See authors instructions.
Originality: All submissions should be original, unpublished, and not under review elsewhere.

Submit your paper.
Select topic "Symposium on Computation Accelerators for AI / ML and IoT: Domain Specific Architectures" to submit paper to this sympoisum.

symposium program

To be announced.