GitPedia

CIF PyTorch

[ICASSP 2020] CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition (A PyTorch implementation of Continuous Integrate-and-Fire mechanism).

From MingLunHan·Updated June 6, 2026·View on GitHub·

:rocket: **Attention! Please refer to https://github.com/MingLunHan/CIF-HieraDist for our latest and complete implementation of the CIF-based speech recognition model!** The project is written primarily in Python, distributed under the Apache License 2.0 license, first published in 2021. Key topics include: alignment, asr, automatic-speech-recognition, cif, continuous-integrate-and-fire.

CIF-PyTorch

:rocket: Attention! Please refer to https://github.com/MingLunHan/CIF-HieraDist for our latest and complete implementation of the CIF-based speech recognition model!

A PyTorch implementation of Continuous Integrate-and-Fire (CIF) module for end-to-end (E2E) automatic speech recognition (ASR), which is originally proposed in Cif: Continuous integrate-and-fire for end-to-end speech recognition https://ieeexplore.ieee.org/document/9054250.

If you have any questions, please contact me through hanminglun1996@foxmail.com.

1. A Feasible Configuration for CIF Module

encoder_embed_dim: 256 # should be the innermost dimension of inputs
produce_weight_type: "conv"
cif_threshold: 0.99
conv_cif_layer_num: 1
conv_cif_width: 3 or 5
conv_cif_output_channels_num: 256
conv_cif_dropout: 0.1
dense_cif_units_num: 256
apply_scaling: True
apply_tail_handling: True
tail_handling_firing_threshold: 0.5
add_cif_ctxt_layers: False

2. Tips

  1. For speech recognition, we usually down-sample the input frame sequence to 1/8 of the its length at the encoder side to ensure efficient training of the CIF module. For other tasks, it should also be ensured that the length difference between input and output of the CIF is kept within reasonable range.
  2. During training, when the scaled sum of the weights differs from the length of the reference transcription, you can truncate the reference and the model output to the same length.
  3. The scaling strategy during training stage may cause gradient exploding, because the calculation of normalize scalar needs division operation. You could add a small value (1e-8) to the denominator to avoid this problem.

3. Other CIF Research Work and Resources

a. Papers:

LLM+CIF

ASR:

ASR Context Biasing:

Low-resource Speech Recognition:

Non-Autoregressive ASR:

Non-Autoregressive Lip Reading:

Speech Translation:

Spiking Neural Networks:

Multimodal ASR:

Keyword Spotting:

b. Repositories:

Contributors

Showing top 1 contributor by commit count.

View all contributors on GitHub →

This article is auto-generated from MingLunHan/CIF-PyTorch via the GitHub API.Last fetched: 6/25/2026