Acceltran
[TCAD'23] AccelTran: A Sparsity-Aware Accelerator for Transformers
AccelTran is a tool to simulate a design space of accelerators on diverse *flexible* and *heterogeneous* transformer architectures supported by the FlexiBERT 2.0 framework at [jha-lab/txf_design-space](https://github.com/JHA-Lab/txf_design-space). The project is written primarily in Python, distributed under the BSD 3-Clause "New" or "Revised" License license, first published in 2022. Key topics include: accelerators, transformers.
AccelTran: A Sparsity-Aware Monolithic 3D Accelerator for Transformer Architectures at Scale
AccelTran is a tool to simulate a design space of accelerators on diverse flexible and heterogeneous transformer architectures supported by the FlexiBERT 2.0 framework at jha-lab/txf_design-space.
The figure below shows the utilization of different modules in an AccelTran architecture for the BERT-Tiny transformer model.

Table of Contents
Environment setup
Clone this repository and initialize sub-modules
shellgit clone https://github.com/JHA-Lab/acceltran.git cd ./acceltran/ git submodule init git submodule update
Setup python environment
The python environment setup is based on conda. The script below creates a new environment named txf_design-space:
shellsource env_setup.sh
For pip installation, we are creating a requirements.txt file. Stay tuned!
Run synthesis
Synthesis scripts use Synopsys Design Compiler. All hardware modules are implemented in SystemVerilog in the directory synthesis/top.
To get area and power consumption reports for each module, use the following command:
shellcd ./synthesis/ dc_shell -f 14nm_sg.tcl -x "set top_module <M>" cd ..
Here, <M> is the module that is to be synthesized in: mac_lane, ln_forward_<T> (for layer normalization), softmax_<T>, etc. where <T> is the tile size among 8, 16, or 32.
All output resports are stored in synthesis/reports.
To run the synthesis for the DMA module, run the following command instead:
shellcd ./synthesis/ dc_shell -f dma.tcl
Run pruning
To get the sparsity in activations and weights in an input transformer model and its corresponding performance on the GLUE benchmark, use the dynamic pruning model: DP-BERT.
To test the effect of different sparsity ratios on the model performance on the SST-2 benchmark, use the following script:
shellcd ./pruning/ python3 run_evaluation.py --task sst2 --max_pruning_threshold 0.1 cd ..
The script uses a weight-pruned model, and so, the weights are not pruned futher. To prune the weights with a pruning_threshold as well, use the flag: --prune_weights.
Run simulator
AccelTran supports a diverse range of accelerator hyperparameters. It also supports all ~10<sup>88</sup> models in the FlexiBERT 2.0 design space.
To specify the configuration of an accelerator's architecture, use a configuration file in simulator/config directory. Example configuration files are given accelerators optimized for BERT-Nano and BERT-Tiny. Accelerator hardware configuration files should conform with the design space specified in the simulator/design_space/design_space.yaml file.
To specify the transformer model parameters, use a model dictionary file in simulator/model_dicts. Model dictionaries for BERT-Nano and BERT-Tiny have already been provided for convenience.
To run AccelTran on the BERT-Tiny model, while plotting utilization and metric curves every 1000 cycles, use the following command:
shellcd ./simulator/ python3 run_simulator.py --model_dict_path ./model_dicts/bert_tiny.json --config_path ./config/config_tiny.yaml --plot_steps 1000 --debug cd ..
This will output the accelerator state for every cycle. For more information on the possible inputs to the simulation script, use:
shellcd ./simulator/ python3 run_simulator.py --help cd ..
Developer
Shikhar Tuli. For any questions, comments or suggestions, please reach me at stuli@princeton.edu.
Cite this work
Cite our work using the following bitex entry:
bibtex@article{tuli2023acceltran, title={{AccelTran}: A Sparsity-Aware Accelerator for Dynamic Inference with Transformers}, author={Tuli, Shikhar and Jha, Niraj K}, journal={arXiv preprint arXiv:2302.14705}, year={2023} }
If you use the AccelTran design space to implement transformer-accelerator co-design, please also cite:
bibtex@article{tuli2023transcode, title={{TransCODE}: Co-design of Transformers and Accelerators for Efficient Training and Inference}, author={Tuli, Shikhar and Jha, Niraj K}, journal={arXiv preprint arXiv:2303.14882}, year={2023} }
License
BSD-3-Clause.
Copyright (c) 2022, Shikhar Tuli and Jha Lab.
All rights reserved.
See License file for more details.
Contributors
Showing top 1 contributor by commit count.
