GitPedia

Acceltran

[TCAD'23] AccelTran: A Sparsity-Aware Accelerator for Transformers

From jha-lab·Updated May 29, 2026·View on GitHub·

AccelTran is a tool to simulate a design space of accelerators on diverse *flexible* and *heterogeneous* transformer architectures supported by the FlexiBERT 2.0 framework at [jha-lab/txf_design-space](https://github.com/JHA-Lab/txf_design-space). The project is written primarily in Python, distributed under the BSD 3-Clause "New" or "Revised" License license, first published in 2022. Key topics include: accelerators, transformers.

AccelTran: A Sparsity-Aware Monolithic 3D Accelerator for Transformer Architectures at Scale

Python Version
Conda
PyTorch
Hits

AccelTran is a tool to simulate a design space of accelerators on diverse flexible and heterogeneous transformer architectures supported by the FlexiBERT 2.0 framework at jha-lab/txf_design-space.

The figure below shows the utilization of different modules in an AccelTran architecture for the BERT-Tiny transformer model.

AccelTran GIF

Table of Contents

Environment setup

Clone this repository and initialize sub-modules

shell
git clone https://github.com/JHA-Lab/acceltran.git cd ./acceltran/ git submodule init git submodule update

Setup python environment

The python environment setup is based on conda. The script below creates a new environment named txf_design-space:

shell
source env_setup.sh

For pip installation, we are creating a requirements.txt file. Stay tuned!

Run synthesis

Synthesis scripts use Synopsys Design Compiler. All hardware modules are implemented in SystemVerilog in the directory synthesis/top.

To get area and power consumption reports for each module, use the following command:

shell
cd ./synthesis/ dc_shell -f 14nm_sg.tcl -x "set top_module <M>" cd ..

Here, <M> is the module that is to be synthesized in: mac_lane, ln_forward_<T> (for layer normalization), softmax_<T>, etc. where <T> is the tile size among 8, 16, or 32.

All output resports are stored in synthesis/reports.

To run the synthesis for the DMA module, run the following command instead:

shell
cd ./synthesis/ dc_shell -f dma.tcl

Run pruning

To get the sparsity in activations and weights in an input transformer model and its corresponding performance on the GLUE benchmark, use the dynamic pruning model: DP-BERT.

To test the effect of different sparsity ratios on the model performance on the SST-2 benchmark, use the following script:

shell
cd ./pruning/ python3 run_evaluation.py --task sst2 --max_pruning_threshold 0.1 cd ..

The script uses a weight-pruned model, and so, the weights are not pruned futher. To prune the weights with a pruning_threshold as well, use the flag: --prune_weights.

Run simulator

AccelTran supports a diverse range of accelerator hyperparameters. It also supports all ~10<sup>88</sup> models in the FlexiBERT 2.0 design space.

To specify the configuration of an accelerator's architecture, use a configuration file in simulator/config directory. Example configuration files are given accelerators optimized for BERT-Nano and BERT-Tiny. Accelerator hardware configuration files should conform with the design space specified in the simulator/design_space/design_space.yaml file.

To specify the transformer model parameters, use a model dictionary file in simulator/model_dicts. Model dictionaries for BERT-Nano and BERT-Tiny have already been provided for convenience.

To run AccelTran on the BERT-Tiny model, while plotting utilization and metric curves every 1000 cycles, use the following command:

shell
cd ./simulator/ python3 run_simulator.py --model_dict_path ./model_dicts/bert_tiny.json --config_path ./config/config_tiny.yaml --plot_steps 1000 --debug cd ..

This will output the accelerator state for every cycle. For more information on the possible inputs to the simulation script, use:

shell
cd ./simulator/ python3 run_simulator.py --help cd ..

Developer

Shikhar Tuli. For any questions, comments or suggestions, please reach me at stuli@princeton.edu.

Cite this work

Cite our work using the following bitex entry:

bibtex
@article{tuli2023acceltran, title={{AccelTran}: A Sparsity-Aware Accelerator for Dynamic Inference with Transformers}, author={Tuli, Shikhar and Jha, Niraj K}, journal={arXiv preprint arXiv:2302.14705}, year={2023} }

If you use the AccelTran design space to implement transformer-accelerator co-design, please also cite:

bibtex
@article{tuli2023transcode, title={{TransCODE}: Co-design of Transformers and Accelerators for Efficient Training and Inference}, author={Tuli, Shikhar and Jha, Niraj K}, journal={arXiv preprint arXiv:2303.14882}, year={2023} }

License

BSD-3-Clause.
Copyright (c) 2022, Shikhar Tuli and Jha Lab.
All rights reserved.

See License file for more details.

Contributors

Showing top 1 contributor by commit count.

View all contributors on GitHub →

This article is auto-generated from jha-lab/acceltran via the GitHub API.Last fetched: 6/24/2026