Stog
AMR Parsing as Sequence-to-Graph Transduction
Code for the AMR Parser in our ACL 2019 paper "[AMR Parsing as Sequence-to-Graph Transduction](https://arxiv.org/pdf/1905.08704.pdf)". The project is written primarily in Python, distributed under the MIT License license, first published in 2019. Key topics include: abstract-meaning-representation, acl2019, amr, nlp, pytorch.
AMR Parsing as Sequence-to-Graph Transduction
Code for the AMR Parser
in our ACL 2019 paper "AMR Parsing as Sequence-to-Graph Transduction".
If you find our code is useful, please cite:
@inproceedings{zhang-etal-2018-stog,
title = "{AMR Parsing as Sequence-to-Graph Transduction}",
author = "Zhang, Sheng and
Ma, Xutai and
Duh, Kevin and
Van Durme, Benjamin",
booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2019",
address = "Florence, Italy",
publisher = "Association for Computational Linguistics"
}
1. Environment Setup
The code has been tested on Python 3.6 and PyTorch 0.4.1.
All other dependencies are listed in requirements.txt.
Via conda:
bashconda create -n stog python=3.6 source activate stog pip install -r requirements.txt
2. Data Preparation
Download Artifacts:
bash./scripts/download_artifacts.sh
Assuming that you're working on AMR 2.0 (LDC2017T10),
unzip the corpus to data/AMR/LDC2017T10, and make sure it has the following structure:
bash(stog)$ tree data/AMR/LDC2017T10 -L 2 data/AMR/LDC2017T10 ├── data │ ├── alignments │ ├── amrs │ └── frames ├── docs │ ├── AMR-alignment-format.txt │ ├── amr-guidelines-v1.2.pdf │ ├── file.tbl │ ├── frameset.dtd │ ├── PropBank-unification-notes.txt │ └── README.txt └── index.html
Prepare training/dev/test data:
bash./scripts/prepare_data.sh -v 2 -p data/AMR/LDC2017T10
3. Feature Annotation
We use Stanford CoreNLP (version 3.9.2) for lemmatizing, POS tagging, etc.
First, start a CoreNLP server following the API documentation.
Then, annotate AMR sentences:
bash./scripts/annotate_features.sh data/AMR/amr_2.0
4. Data Preprocessing
bash./scripts/preprocess_2.0.sh
5. Training
Make sure that you have at least two GeForce GTX TITAN X GPUs to train the full model.
bashpython -u -m stog.commands.train params/stog_amr_2.0.yaml
6. Prediction
bashpython -u -m stog.commands.predict \ --archive-file ckpt-amr-2.0 \ --weights-file ckpt-amr-2.0/best.th \ --input-file data/AMR/amr_2.0/test.txt.features.preproc \ --batch-size 32 \ --use-dataset-reader \ --cuda-device 0 \ --output-file test.pred.txt \ --silent \ --beam-size 5 \ --predictor STOG
7. Data Postprocessing
bash./scripts/postprocess_2.0.sh test.pred.txt
8. Evaluation
Note that the evaluation tool works on python2, so please make sure python2 is visible in your $PATH.
bash./scripts/compute_smatch.sh test.pred.txt data/AMR/amr_2.0/test.txt
Pre-trained Models
Here are pre-trained models:
ckpt-amr-2.0.tar.gz
and ckpt-amr-1.0.tar.gz.
To use them for prediction, simply download & unzip them, and then run Step 6-8.
In case that you only need the pre-trained model prediction (i.e., test.pred.txt), you can find it in the download.
Acknowledgements
We adopted some modules or code snippets from AllenNLP,
OpenNMT-py
and NeuroNLP2.
Thanks to these open-source projects!
License
Contributors
Showing top 1 contributor by commit count.
