Vision Language Transformer
[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation
Please consider citing our paper in your publications if the project helps your research. The project is written primarily in Python, distributed under the MIT License license, first published in 2021. Key topics include: iccv2021, keras, referring-segmentation, tensorflow, tpami.
Vision-Language Transformer and Query Generation for Referring Segmentation
Please consider citing our paper in your publications if the project helps your research.
@inproceedings{vision-language-transformer,
title={Vision-Language Transformer and Query Generation for Referring Segmentation},
author={Ding, Henghui and Liu, Chang and Wang, Suchen and Jiang, Xudong},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
year={2021}
}
Introduction
Vision-Language Transformer (VLT) is a framework for referring segmentation task. Our method produces multiple query vector for one input language expression, and use each of them to “query” the input image, generating a set of responses. Then the network selectively aggregates these responses, in which queries that provide better comprehensions are spotlighted.
<p align="center"> <img src="fig0.png" width="500px"> </p>Installation
-
Environment:
-
Python 3.6
-
tensorflow 1.15
-
Other dependencies in
requirements.txt -
SpaCy model for embedding:
python -m spacy download en_vectors_web_lg
-
-
Dataset preparation
-
Put the folder of COCO training set ("
train2014") underdata/images/. -
Download the RefCOCO dataset from here and extract them to
data/. Then run the script for data preparation underdata/:cd data python data_process_v2.py --data_root . --output_dir data_v2 --dataset [refcoco/refcoco+/refcocog] --split [unc/umd/google] --generate_mask
-
Evaluating
-
Download pretrained models & config files from here.
-
In the config file, set:
evaluate_model: path to the pretrained weightsevaluate_set: path to the dataset for evaluation.
-
Run
python vlt.py test [PATH_TO_CONFIG_FILE]
Training
-
Pretrained Backbones:
We use the backbone weights proviede by MCN.Note: we use the backbone that excludes all images that appears in the val/test splits of RefCOCO, RefCOCO+ and RefCOCOg.
-
Specify hyperparameters, dataset path and pretrained weight path in the configuration file. Please refer to the examples under
/config, or config file of our pretrained models. -
Run
python vlt.py train [PATH_TO_CONFIG_FILE]
Acknowledgement
We borrowed a lot of codes from MCN, keras-transformer, RefCOCO API and keras-yolo3. Thanks for their excellent works!
Contributors
Showing top 2 contributors by commit count.
