Help: API

The DocumentCloud API

DocumentCloud's API provides resources to search, upload, edit, and organize documents as well as to work with projects. In addition, an oEmbed service provides easy integration of embedding documents, pages and notes.

Use of the DocumentCloud API indicates you have read and agree to our API Guidelines and Terms of Service.

Contents

Document Methods

GET /api/search.json

Search the catalog of public documents. This method can be used to scrape the public documents from your account for embedding purposes or to enable searches of your archive of uploaded documents directly from your own website. See our search documentation for help with search queries.

Parameter Description Example
q the search query group:nytimes title:nuclear
page response page number 3 (defaults to 1)
per_page the number of documents to return per page 100 (defaults to 10, max is 1,000)
sections include document sections in the results true (not present by default)
annotations include document annotations in the results true (not present by default)
data include key/value data in the results true (not present by default)
mentions include highlighted mentions of the search phrase 3 (not present by default, max is 10)
order the order by which documents are listed title (default is "created_at", other choices are: "score", "created_at", "title", "page_count", "source")

Example

/api/search.json?q=obama&page=2

Use the search form below to try queries and see what the resulting JSON looks like.

Tips

  • If you'd like to get back search results with more than 10 documents on a page, pass the per_page parameter. A maximum of 1,000 documents will be returned at a time.

POST /api/upload.json

Our API for bulk uploads exposes the same method that we use internally, but wraps it in basic authentication over HTTPS. Documents will be uploaded into the authenticated account.

You can either upload a local file using a standard multi-part upload, or tell DocumentCloud to download the file from a public server by passing a URL.

Parameter Description Example
file (required) either the contents of a local file, or the URL where the document can be found --
title (required) the document's canonical title 2008 Blagojevich Tax Return
source (optional) the source who produced the document U.S. Attorney's Office
description (optional) a paragraph of detailed description This prosecution exhibit is the 2008 joint tax return for Rod and Patti Blagojevich. It shows their total income for the year was $284,000.
language (optional) The language of the document. Will be used to determine what OCR package to use for files that require OCR processing. Default is: "eng" eng
related_article (optional) the URL of the article associated with the document http://example.com/news/blago/2010-5-3.html
published_url (optional) the URL of the page on which the document will be embedded http://documents.example.com/blago-transcript.html
access (optional) one of "public", "private", "organization", defaults to "private" public
project (optional) a numeric Project id, to upload the document into an existing project 1012
data (optional) a hash of arbitrary key/value data pairs {"data": {"status": "active"}} (json)
data[status]=active (query string)
secure (optional) If you're dealing with a truly sensitive document, pass the "secure" parameter in order to prevent the document from being sent to OpenCalais for entity extraction. true
force_ocr (optional) specify that a document should be OCR'd even if it has text in it (default is "false") true

Tips

  • Please ensure that you send the request properly encoded as "multipart/form-data"
  • Review your uploaded files and add a source and description if you didn't.

Example

Using Ruby's RestClient library you could do:

RestClient.post('https://ME%40TEST.COM:SECRET@www.documentcloud.org/api/upload.json',
  :file   => File.new('/full/path/to/document/document.pdf','rb'),
  :title  => "2008 Blagojevich Tax Return",
  :source => "U.S. Attorney's Office",
  :access => 'private',
  :data   => {"date" => "2009-04-01", "exhibit" => "E1146"}
)

GET /api/documents/[id].json

Retrieve the canonical JSON representation of a particular document, as specified by the document id (usually something like: 1659580-economic-analysis-of-the-south-pole-traverse).

Example Response

{"document":{
  "id":"1659580-economic-analysis-of-the-south-pole-traverse",
  "title":"Economic Analysis of the South Pole Traverse",
  "access":"public"
  "pages":38,
  "description":"The South Pole Traverse is a highway of compacted snow built to provide an overland supply route between McMurdo Station on the Antarctic coast and the Amundsen–Scott South Pole Station.  This report provides an account of the logistical costs associated with transport across the Traverse compared with air transport via LC-130 Hercules aircraft.",
  "source":"http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA602402",
  "created_at":"Wed, 11 Feb 2015 18:30:58 +0000",
  "updated_at":"Sun, 08 Mar 2015 15:23:02 +0000",
  "canonical_url":"https://www.documentcloud.org/documents/1659580-economic-analysis-of-the-south-pole-traverse.html",
  "language":"eng",
  "file_hash":"c07f7b640c4df2132bacb8dbfac1dcb65f978418",
  "contributor":"Ted Han",
  "contributor_organization":"DocumentCloud",
  "display_language":"eng",
  "resources":{
    "pdf":"https://assets.documentcloud.org/documents/1659580/economic-analysis-of-the-south-pole-traverse.pdf",
    "text":"https://assets.documentcloud.org/documents/1659580/economic-analysis-of-the-south-pole-traverse.txt",
    "thumbnail":"https://assets.documentcloud.org/documents/1659580/pages/economic-analysis-of-the-south-pole-traverse-p1-thumbnail.gif",
    "search":"https://www.documentcloud.org/documents/1659580/search.json?q={query}",
    "print_annotations":"https://www.documentcloud.org/notes/print?docs[]=1659580",
    "translations_url":"https://www.documentcloud.org/translations/{realm}/{language}",
    "page":{
      "image":"https://assets.documentcloud.org/documents/1659580/pages/economic-analysis-of-the-south-pole-traverse-p{page}-{size}.gif",
      "text":"https://www.documentcloud.org/documents/1659580/pages/economic-analysis-of-the-south-pole-traverse-p{page}.txt"
      },
    "annotations_url":"https://www.documentcloud.org/documents/1659580/annotations"
  },
  "sections":[],
  "data":{},
  "annotations":[]
}}

Tips

  • Security note: For fidelity with the source document, the extracted text (available via the URLs provided in document.resources.text and the document.resources.page.text page iteration pattern) is not sanitized. You should always escape document and page text before insertion into the DOM.

PUT /api/documents/[id].json

Update a document's title, source, description, related article, access level, or data with this method. Reference your document by its id (usually something like: 1659580-economic-analysis-of-the-south-pole-traverse).

Parameter Description Example
title (optional) the document's canonical title 2008 Blagojevich Tax Return
source (optional) the source who produced the document U.S. Attorney's Office
description (optional) a paragraph of detailed description This prosecution exhibit is the 2008 joint tax return for Rod and Patti Blagojevich. It shows their total income for the year was $284,000.
related_article (optional) the URL of the article associated with the document http://example.com/news/blago/2010-5-3.html
published_url (optional) the URL of the page on which the document will be embedded http://documents.example.com/blago-transcript.html
access (optional) one of "public", "private", "organization" "public"
data (optional) a hash of arbitrary key/value data pairs {"data": {"status": "active"}} (json)
data[status]=active (query string)

The response value of this method will be the JSON representation of your document (as seen in the GET method above), with all updates applied.

Tips

  • If your HTTP client is unable to create a PUT request, you can send it as a POST and add an extra parameter: _method=put

DELETE /api/documents/[id].json

Delete a document from DocumentCloud. You must be authenticated as the owner of the document for this method to work.

Tips

  • If your HTTP client is unable to create a DELETE request, you can send it as a POST, and add an extra parameter: _method=delete

GET /api/documents/[id]/entities.json

Retrieve the JSON for all of the entities that a particular document contains, specified by the document id (usually something like: 1659580-economic-analysis-of-the-south-pole-traverse). Entities are ordered by their relevance to the document as determined by OpenCalais.

Example Response

{
  "entities":{
    "person":[
      { "value":"Ramadan Aff", "relevance":0.72 },
      { "value":"Sarah Normand", "relevance":0.612 },
      ...
    ],
    "organization":[
      { "value":"Supreme Court", "relevance":0.619 },
      { "value":"Hamas", "relevance":0.581 },
      ...
    ]
    ...
  }
}

Project Methods

POST /api/projects.json

Create a new project for the authenticated account, with a title, optional description, and optional document ids.

Parameter Description Example
title (required) the projects's title Drywall Complaints
description (optional) a paragraph of detailed description A collection of documents from 2007-2009 relating to reports of tainted drywall in Florida.
document_ids (optional) a list of documents that the project contains, by id 28-rammussen, 207-petersen

Tips

  • Note that you have to use the convention for passing an array of strings: ?document_ids[]=28-boumediene&document_ids[]=207-academy&document_ids[]=30-insider-trading

GET /api/projects.json

Retrieve a list of project names and document ids. You must use basic authentication over HTTPS in order to make this request. The projects listed belong to the authenticated account.

Example Response

{"projects": [
  {
    "id": 5,
    "title": "Literate Programming",
    "document_ids":[
      "103-literate-programming-a-practioners-view",
      "104-reverse-literate-programming"
    ]
  },
  ...
]}

PUT /api/projects/[id].json

Update an existing project for the current authenticated account. You can set the title, description or list of documents. See POST, above.

DELETE /api/projects/[id].json

Delete a project that belongs to the current authenticated account.

oEmbed

GET /api/oembed.json

Generate an embed code for a resource (a document or a note) using our oEmbed service. Returns a rich JSON response.

Response format

{
  "type": "rich",
  "version": "1.0",
  "provider_name": "DocumentCloud",
  "provider_url": "https://www.documentcloud.org/",
  "cache_age": 300,
  "height": 750,
  "width": 600,
  "html": "<script>...</script>"
}

Example document request

/api/oembed.json?url=https%3A%2F%2Fwww.documentcloud.org%2Fdocuments%2Fdoc-name.html

Parameters for documents

Parameter Description Example
url (required) URL-escaped document to embed https%3A%2F%2Fwww.documentcloud.org%2F documents%2Fdoc-name.html
maxheight (optional) The viewer's height (pixels) 750
maxwidth (optional) The viewer's width (pixels) 600
container (optional) Specify the DOM container in which to embed the viewer #my-document-div
notes (optional) Enable the notes tab true (default)
text (optional) Enable the text tab true (default)
zoom (optional) Show the zoom slider true (default)
search (optional) Show the search box true (default)
sidebar (optional) Show the sidebar true (default)
pdf (optional) Include a link to the original PDF true (default)
responsive (optional) Make the viewer responsive false (default)
responsive_offset (optional) Specify header height (pixels) 4
note (optional) Open the document to a specific note. An integer representing the note ID 214279
page (optional) Open the document to a specific page 3

Example pages request

/api/oembed.json?url=https%3A%2F%2Fwww.documentcloud.org%2Fdocuments%2Fdoc-name%2Fpages%2F5.html

Parameters for pages

Parameter Description Example
url (required) URL-escaped document page to embed https%3A%2F%2Fwww.documentcloud.org%2F documents%2Fdoc-name%2Fpages%2F5.html

Example note request

/api/oembed.json?url=https%3A%2F%2Fwww.documentcloud.org%2Fdocuments%2Fdoc-name%2Fannotations%2F220666.html

Parameters for notes

Parameter Description Example
url (required) URL-escaped document to embed https%3A%2F%2Fwww.documentcloud.org%2F documents%2Fdoc-name%2Fannotations%2F220666.html
container (optional) Specify the DOM container in which to embed the viewer #my-document-div

API Wrappers and Utilities

The open-source community has contributed several helpful libraries for interacting with DocumentCloud's API. See their documentation for examples and more information:

Node.js

Python:

Ruby:

  • Documentcloud: RubyGem for interacting with the DocumentCloud API.

Questions?

Still have questions about the API? Don't hesitate to contact us.