This document is about providing a proposal for a standardized way of making queries to (heterogeneous) databases over HTTP. It is the need for support for querying over HTTP that makes this a protocol rather than just a language though it will build on or require a data query language of some form.

The kind of use cases we are thinking of are:

  • Data viewers calling databases to get data to display.
  • Visualisation tools calling databases or data scraping tools.
  • Crowd sourcing tools augmenting information dynamically pulled from a data catalogue.

Introduction

Query support would involve supporting things like:

  • size (limit)
  • from (offset)
  • sorting (ordering by)
  • filtering
  • aggregation (sum, count, distinct)

Proposal

The proposal divides into 2 parts. First, the definition of a JSON-serializable query object. Second, the presentation of that data to a web accessible query endpoint.

Query Object

The Proposal is heavily based on ElasticSearch query language

Query object has the following key attributes:

Additions:

  • q: either straight text or a hash will map directly onto a query_string query in backend

    • Of course this can be re-interpreted by different backends. E.g. some may just pass this straight through e.g. for an SQL backend this could be the full SQL query
  • filters: dict of fields with for each one specified a filter like term, terms, prefix, range. This provides a quick way to do filtering.

    • Value for a field can just be text in which case this becomes a term query on that field

      • E.g. my-field: ‘abc’ - would only match results with abc in that field

Examples

{
   q: 'quick brown fox',
   filters: {
     'owner': 'jones'
   }
}

Existing Work

ElasticSearch

JSON oriented document store and search index.

Webstore

Designed to expose RDBMS over RESTful HTTP.

SQL

Raw SQL over HTTP.

This is one in Scraperwiki and the Webstore.

DAP

DAP is a data transmission protocol designed specifically for science data. The protocol relies on the widely used and stable HTTP and MIME standards, and provides data types to accommodate gridded data, relational data, and time series, as well as allowing users to define their own data types.

Unstructured Query Language

UnQL is a query language not a query protocol so provides no information on how clients and servers interact.

HTSQL

  • http://htsql.org/
  • A database query language based on SQL

    • HTSQL is a URI-based high-level query language for relational databases. HTSQL wraps your database with a web service layer, translating HTTP requests into SQL and returning results as HTML, JSON, etc.

URI Fragment Identifiers for the text/csv Media Type

Google Visualization API Query Language

Another restricted SQL. Has advantage of one existing implementation - so would immediately work with Google Spreadsheets and Fusion Tables, presumably? Also

JSONiq

JSONiq extends XQuery, a mature W3C standard, with native JSON support. Like XQuery and SQL, JSONiq is declarative: Expressions can nest with full composability.

Notes: as of Autumn 2012 lacks implementations in any mainstream language.