AccelTran: A Sparsity-Aware Monolithic 3D Accelerator for Transformer Architectures at Scale

AccelTran is a tool to simulate a design space of accelerators on diverse flexible and heterogeneous transformer architectures supported by the FlexiBERT 2.0 framework at jha-lab/txf_design-space.

The figure below shows the utilization of different modules in an AccelTran architecture for the BERT-Tiny transformer model.

AccelTran GIF

Environment Setup
- Clone this repository and initialize sub-modules
- Setup python environment
Run synthesis
Run pruning
Run simulator
Developer
Cite this work
License

Environment setup

Clone this repository and initialize sub-modules

shell
git clone https://github.com/JHA-Lab/acceltran.git
cd ./acceltran/
git submodule init
git submodule update

Setup python environment

The python environment setup is based on conda. The script below creates a new environment named txf_design-space:

shell
source env_setup.sh

For pip installation, we are creating a requirements.txt file. Stay tuned!

Run synthesis

Synthesis scripts use Synopsys Design Compiler. All hardware modules are implemented in SystemVerilog in the directory synthesis/top.

To get area and power consumption reports for each module, use the following command:

shell
cd ./synthesis/
dc_shell -f 14nm_sg.tcl -x "set top_module <M>"
cd ..

Here, <M> is the module that is to be synthesized in: mac_lane, ln_forward_<T> (for layer normalization), softmax_<T>, etc. where <T> is the tile size among 8, 16, or 32.

All output resports are stored in synthesis/reports.

To run the synthesis for the DMA module, run the following command instead:

shell
cd ./synthesis/
dc_shell -f dma.tcl

Run pruning

To get the sparsity in activations and weights in an input transformer model and its corresponding performance on the GLUE benchmark, use the dynamic pruning model: DP-BERT.

To test the effect of different sparsity ratios on the model performance on the SST-2 benchmark, use the following script:

shell
cd ./pruning/
python3 run_evaluation.py --task sst2 --max_pruning_threshold 0.1
cd ..

The script uses a weight-pruned model, and so, the weights are not pruned futher. To prune the weights with a pruning_threshold as well, use the flag: --prune_weights.

Run simulator

AccelTran supports a diverse range of accelerator hyperparameters. It also supports all ~10<sup>88</sup> models in the FlexiBERT 2.0 design space.

To specify the configuration of an accelerator's architecture, use a configuration file in simulator/config directory. Example configuration files are given accelerators optimized for BERT-Nano and BERT-Tiny. Accelerator hardware configuration files should conform with the design space specified in the simulator/design_space/design_space.yaml file.

To specify the transformer model parameters, use a model dictionary file in simulator/model_dicts. Model dictionaries for BERT-Nano and BERT-Tiny have already been provided for convenience.

To run AccelTran on the BERT-Tiny model, while plotting utilization and metric curves every 1000 cycles, use the following command:

shell
cd ./simulator/
python3 run_simulator.py --model_dict_path ./model_dicts/bert_tiny.json --config_path ./config/config_tiny.yaml --plot_steps 1000 --debug
cd ..

This will output the accelerator state for every cycle. For more information on the possible inputs to the simulation script, use:

shell
cd ./simulator/
python3 run_simulator.py --help
cd ..

Developer

Shikhar Tuli. For any questions, comments or suggestions, please reach me at stuli@princeton.edu.

Cite this work

Cite our work using the following bitex entry:

bibtex
@article{tuli2023acceltran,
  title={{AccelTran}: A Sparsity-Aware Accelerator for Dynamic Inference with Transformers},
  author={Tuli, Shikhar and Jha, Niraj K},
  journal={arXiv preprint arXiv:2302.14705},
  year={2023}
}

If you use the AccelTran design space to implement transformer-accelerator co-design, please also cite:

bibtex
@article{tuli2023transcode,
  title={{TransCODE}: Co-design of Transformers and Accelerators for Efficient Training and Inference},
  author={Tuli, Shikhar and Jha, Niraj K},
  journal={arXiv preprint arXiv:2303.14882},
  year={2023}
}