GitPedia

Pdf annotation extractor alfred

Alfred Workflow to extract annotations from PDF files.

From chrisgrieser·Updated April 13, 2026·View on GitHub·

A [Workflow for Alfred](https://www.alfredapp.com/) to extract annotations as Markdown file. Primarily for scientific papers, but can also be used for non-academic PDF files. The project is written primarily in JavaScript, distributed under the MIT License license, first published in 2021. Key topics include: alfred-workflow, pandoc, pdf, pdf-annotation.

Latest release: 9.2.2
October 30, 2024View Changelog →

PDF annotation extractor

Download count
Last release

A Workflow for Alfred to extract annotations as
Markdown file. Primarily for scientific papers, but can also be used for

<!-- harper: ignore -->non-academic PDF files.

Automatically determines correct page numbers, inserts them as Pandoc citations,
merges highlights across page breaks, prepends a YAML header with bibliographic
information, and more.

Table of contents

<!-- toc --> <!-- tocstop -->

Installation

  • Requirement: Alfred 5 with Powerpack
  • Install Homebrew
  • Install pdfannots2json by running the following command into your terminal:
    brew install mgmeyers/pdfannots2json/pdfannots2json
  • Download the latest release.
  • Set the hotkey by double-clicking the sky-blue field at the top left.
  • Set up the workflow configuration inside the app.

Requirements for the PDF

PDF Annotation Extractor works on any PDF that has valid annotations
saved in the PDF file. Some PDF readers like Skim or Zotero 6 do not
store annotations in the PDF itself by default.

This workflow automatically determines the citekey of based on the filename of
your PDF file.

  • If the citekey is found, the PDF Annotation Extractor
    prepends a YAML header to the annotations and automatically
    inserts the citekey
    with the correct
    page numbers using the Pandoc citations
    syntax
    .
  • If your filename does not contain citekey that can be found in
    your library, the PDF Annotation Extractor extracts the annotations without
    a YAML header and uses the PDF numbers as page numbers.

Automatic citekey identification

  • The filename of the PDF file MUST begin with the citekey (without @).
  • The citekey MUST NOT contain any underscores (_).
  • The name of the file MAY be followed by an underscore and some
    text, such as {citekey}_{title}.pdf. It MUST NOT be followed by anything
    else, since then the citekey would not be found.
  • Example: With the filename, Grieser2023_Interdependent Technologies.pdf, the
    identified citekey is Grieser2023.

[!TIP]
You can achieve such a filename pattern with automatic renaming rules of most
reference managers, for example with the ZotFile plugin for
Zotero
or the AutoFile feature of
BibDesk
.

Usage

Basics

Use the hotkey to
trigger the Annotation Extraction on the PDF file currently selected in Finder.
The hotkey also works when triggered from PDF Expert
or Highlights. Alternatively, use the
anno keyword to search for PDFs and select one.

Annotation Types extracted <!-- rumdl-disable-line MD036 -->

Automatic page number identification

Instead of the PDF page numbers, this workflow retrieves information about the
real page numbers from the BibTeX library and inserts them. If there is no
page data in the BibTeX entry (for example, monographies), you are prompted to
enter the page number manually.

  • In that case, enter the real page number of your first PDF page.
  • In case there is content before the actual text (for example, a foreword or
    Table of Contents), the real page number 1 often occurs later in the PDF. If
    that is the case, you must enter a negative page number, reflecting the
    true page number the first PDF would have. Example: Your PDF is a book, which
    has a foreword, and uses roman numbers for it; real page number 1 is PDF page
    number 12. If you continued the numbering backwards, the first PDF page would
    have page number -10, you enter the value -10 when prompted for a page
    number.

Annotation codes

Insert the following codes at the beginning of an annotation to invoke
special actions on that annotation. Annotation codes do not apply to
strikethroughs.

  • +: Merge this highlight with the previous highlight or underline. Works for
    annotations on the same PDF-page (= skipping text in between) and for
    annotations across two pages.
    • ? foo (free comments): Turns "foo" into a Question
      Callout (> ![QUESTION]) and move up. (Callouts are Obsidian-specific
      Syntax
      .)
  • ##: Turns highlighted text into a heading that is added at that
    location. The number of # determines the heading level. If the annotation is
    a free comment, the text following the # is used as heading instead. (The
    space after the is # required).
  • =: Adds highlighted text as tags to the YAML frontmatter. If the
    annotation is a free comment, uses the text
    after the =. In both cases, the annotation is removed afterward.
  • _: A copy of the annotation is sent Reminders.app as a task due today
    (default list).

[!TIP]
You can run the Alfred command acode to display a cheat sheet of all
annotation codes.

Extracting images

  • The respective images are saved in the attachments subfolder of the output
    folder, and named {citekey}_image{n}.png.
  • The images are embedded in the Markdown file with the ![[ ]] syntax, for
    example ![[filename.png|foobar]].
<!-- LTeX: enabled=false -->
  • Any rectangle type annotation in the PDF is extracted as image.
<!-- LTeX: enabled=true -->
  • If the rectangle annotation has any comment, it is used as the alt-text for
    the image. (Note that some PDF readers like PDF Expert do not allow you to add
    a comment to rectangular annotations.)

Troubleshooting

  • Update to the latest version of pdfannots2json by running
    brew upgrade pdfannots2json in your terminal.
  • This workflow does not work with annotations that are not actually saved in
    the PDF file. Some PDF Readers like Skim or Zotero 6 do this, but you
    can tell those PDF readers to save the notes in the actual
    PDF
    .

[!NOTE]
As a fallback, you can use pdfannots as extraction engine, as a different
PDF engine sometimes fixes issues. This requires installing
pdfannots via pip3 install pdfannots, and switching the fallback engine in the settings. Note
that pdfannots does not support image extraction and the extraction quality
is slightly worse, so generally you want to use pdfannots2json.

Cite this software project

If you want to mention this software project in an academic publication, please
cite it as:

txt
Grieser, C. (2023). PDF Annotation Extractor [Computer software]. https://github.com/chrisgrieser/pdf-annotation-extractor-alfred

For other citation styles, use the following metadata: Citation File
Format
.

<!-- vale Google.FirstPerson = NO -->

Credits

About the developer

In my day job, I am a sociologist studying the social mechanisms underlying the
digital economy. For my PhD project, I investigate the governance of the app
economy and how software ecosystems manage the tension between innovation and
compatibility. If you are interested in this subject, feel free to get in touch.

If you find this project helpful, you can support me via 🩷 GitHub
Sponsors
.

Contributors

Showing top 3 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from chrisgrieser/pdf-annotation-extractor-alfred via the GitHub API.Last fetched: 6/21/2026