Gitpedia

Graphvite

GraphVite: A General and High-performance Graph Embedding System

From DeepGraphLearning·Updated May 29, 2026·View on GitHub·

GraphVite - graph embedding at high speed and large scale ========================================================= The project is written primarily in C++, distributed under the Apache License 2.0 license, first published in 2019. It has gained significant community traction with 1,269 stars and 156 forks on GitHub. Key topics include: cuda, data-visualization, gpu, knowledge-graph, machine-learning.

GraphVite logo

GraphVite - graph embedding at high speed and large scale

Install with conda
License
Downloads

Docs | Tutorials | Benchmarks | Pre-trained Models

GraphVite is a general graph embedding engine, dedicated to high-speed and
large-scale embedding learning in various applications.

GraphVite provides complete training and evaluation pipelines for 3 applications:
node embedding, knowledge graph embedding and
graph & high-dimensional data visualization. Besides, it also includes 9 popular
models, along with their benchmarks on a bunch of standard datasets.

<table align="center" style="text-align:center"> <tr> <th>Node Embedding</th> <th>Knowledge Graph Embedding</th> <th>Graph & High-dimensional Data Visualization</th> </tr> <tr> <td><img src="asset/graph.png" height="240" /></td> <td><img src="asset/knowledge_graph.png" height="240" /></td> <td><img src="asset/visualization.png" height="240" /></td> </tr> </table>

Here is a summary of the training time of GraphVite along with the best open-source
implementations on 3 applications. All the time is reported based on a server with
24 CPU threads and 4 V100 GPUs.

Training time of node embedding on Youtube dataset.

ModelExisting ImplementationGraphViteSpeedup
DeepWalk1.64 hrs (CPU parallel)1.19 mins82.9x
LINE1.39 hrs (CPU parallel)1.17 mins71.4x
node2vec24.4 hrs (CPU parallel)4.39 mins334x

Training / evaluation time of knowledge graph embedding on FB15k dataset.

ModelExisting ImplementationGraphViteSpeedup
TransE1.31 hrs / 1.75 mins (1 GPU)13.5 mins / 54.3 s5.82x / 1.93x
RotatE3.69 hrs / 4.19 mins (1 GPU)28.1 mins / 55.8 s7.88x / 4.50x

Training time of high-dimensional data visualization on MNIST dataset.

ModelExisting ImplementationGraphViteSpeedup
LargeVis15.3 mins (CPU parallel)13.9 s66.8x

Requirements

Generally, GraphVite works on any Linux distribution with CUDA >= 9.2.

The library is compatible with Python 2.7 and 3.6/3.7.

Installation

From Conda

bash
conda install -c milagraph -c conda-forge graphvite cudatoolkit=$(nvcc -V | grep -Po "(?<=V)\d+.\d+")

If you only need embedding training without evaluation, you can use the following
alternative with minimal dependencies.

bash
conda install -c milagraph -c conda-forge graphvite-mini cudatoolkit=$(nvcc -V | grep -Po "(?<=V)\d+.\d+")

From Source

Before installation, make sure you have conda installed.

bash
git clone https://github.com/DeepGraphLearning/graphvite cd graphvite conda install -y --file conda/requirements.txt mkdir build cd build && cmake .. && make && cd - cd python && python setup.py install && cd -

On Colab

bash
!wget -c https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh !chmod +x Miniconda3-latest-Linux-x86_64.sh !./Miniconda3-latest-Linux-x86_64.sh -b -p /usr/local -f !conda install -y -c milagraph -c conda-forge graphvite \ python=3.6 cudatoolkit=$(nvcc -V | grep -Po "(?<=V)\d+\.\d+") !conda install -y wurlitzer ipykernel
python
import site site.addsitedir("/usr/local/lib/python3.6/site-packages") %reload_ext wurlitzer

Quick Start

Here is a quick-start example of the node embedding application.

bash
graphvite baseline quick start

Typically, the example takes no more than 1 minute. You will obtain some output like

Batch id: 6000
loss = 0.371041

------------- link prediction --------------
AUC: 0.899933

----------- node classification ------------
macro-F1@20%: 0.242114
micro-F1@20%: 0.391342

Baseline Benchmark

To reproduce a baseline benchmark, you only need to specify the keywords of the
experiment. e.g. model and dataset.

bash
graphvite baseline [keyword ...] [--no-eval] [--gpu n] [--cpu m] [--epoch e]

You may also set the number of GPUs and the number of CPUs per GPU.

Use graphvite list to get a list of available baselines.

Custom Experiment

Create a yaml configuration scaffold for graph, knowledge graph, visualization or
word graph.

bash
graphvite new [application ...] [--file f]

Fill some necessary entries in the configuration following the instructions. You
can run the configuration by

bash
graphvite run [config] [--no-eval] [--gpu n] [--cpu m] [--epoch e]

High-dimensional Data Visualization

You can visualize your high-dimensional vectors with a simple command line in
GraphVite.

bash
graphvite visualize [file] [--label label_file] [--save save_file] [--perplexity n] [--3d]

The file can be either a numpy dump *.npy or a text matrix *.txt. For the save
file, we recommend to use png format, while pdf is also supported.

Contributing

We welcome all contributions from bug fixs to new features. Please let us know if you
have any suggestion to our library.

Development Team

GraphVite is developed by MilaGraph, led by Prof. Jian Tang.

Authors of this project are Zhaocheng Zhu, Shizhen Xu, Meng Qu and Jian Tang.
Contributors include Kunpeng Wang and Zhijian Duan.

Citation

If you find GraphVite useful for your research or development, please cite the
following paper.

@inproceedings{zhu2019graphvite,
    title={GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding},
     author={Zhu, Zhaocheng and Xu, Shizhen and Qu, Meng and Tang, Jian},
     booktitle={The World Wide Web Conference},
     pages={2494--2504},
     year={2019},
     organization={ACM}
 }

Acknowledgements

We would like to thank Compute Canada for supporting GPU servers. We specially thank
Wenbin Hou for useful discussions on C++ and GPU programming techniques.

Contributors

Showing top 1 contributor by commit count.

View all contributors on GitHub →

This article is auto-generated from DeepGraphLearning/graphvite via the GitHub API.Last fetched: 6/1/2026