Common REST interface documentation

Purpose

This document describes the web service interfaces that are common for each VESTA service. It shows how a service can be used, the standard response types and how to handle exceptions.

Overview

Workflow

A workflow summary can be seen below.

_images/workflow.png

Worker Services Features

  • A worker receives his task directives from the REST Gateway through Celery/AMQP
    • Can add arbitrary arguments
  • Resulting messages are temporarily stored on AMQP
  • Optional directive to send resulting annotations directly on Annotations Storage System
  • Process acquirement, start and stop times are encoded in resulting messages
  • Worker process version string is communicated in resulting messages, permitting version and results coherence checking
  • Ability to keep trace of all processed documents workflow

API specifications

The API of this service was constructed in a way to respect the requirements of the CANARIE API methods set specification for online services.

Calling Patterns

The services will be invoked using a subset of REST and HTTP methods. In particular, all services are called using the HTTP GET, POST and PUT methods.

Response Formats

All responses are given using the JSON format.

Methods

There are 3 sets of methods available to access services. The first two are specific to their respective front ends which are the Load Balancer and the Multimedia File Storage and the third one is the set of methods required by CANARIE and thus is available on both front ends.

To cancel a given processing request

This method uses HTTP GET.

Parameters:

uuid:The identifier of a previous processing request.

Return value:

Returns the status of the request after having submitted the cancel request. Once cancelled, any subsequent status requests will return state as revoked.

Examples:

URL form:

<Base URI>/cancel?uuid=6547137e-cc2f-4008-b1eb-4ae8e898ce83

Information route

You can obtain information on the current configuration of a deployed instance by issuing a HTTP GET at the /info route. This will give you information on the configured services with their expected version, list of route names and so on.

Parameters:

None.

Return value:

A JSON object with the list of configured routes and associated services such as:

{
  "services": {
    "transcoder": {
      "category": "Data Storage",
      "celery_queue_name": "transcoder_0.2.7",
      "celery_task_name": "transcoder",
      "doc": "http://some_server/doc.html",
      "home": "http://some_server/doc.html",
      "institution": "CRIM",
      "licence": "http://some_server/doc.html",
      "name": "Transcoder service",
      "provenance": "http://some_server/doc.html",
      "releaseTime": "2015-01-01T00:00:00Z",
      "releasenotes": "http://some_server/doc.html",
      "researchSubject": "Multimedia file transcoding",
      "route_keyword": "transcoder",
      "source": ",204",
      "support": "http://some_server/doc.html",
      "supportEmail": "support@company.com",
      "synopsis": "RESTful service providing multimedia files transcoding.",
      "tags": "multimedia,file,transcoding",
      "tryme": "ssm/tryme.html",
      "version": "0.2.7"
    }
  },
  "version": "1.7.0"
}

Useful elements are:

version:

The version of the REST API.

services:

Every element in the list is an exposed service through a dynamic route. Most elements reflect the requirements of the CANARIE API specification.

version:The expected version of a Service connected through Celery/AMQP.
route_keyword:The actual route you use to interact with the API for a given service. Note that there might be an additionnal ‘/info’ route for the route of a given service. e.g. : “/transcoder/info”.
doc:A valid URL to the service’s documentation.

Examples:

URL form:

<Base URI>/info

Status method

For methods requiring asynchronous tasks, there is a also a corresponding method that lets monitor the status of submitted tasks. The response format of this method is uniform across all services and contains 3 keys :

  • uuid
  • status
  • result

For example:

{
    "result": {
        "worker_id_version": "0.1.0",
        "current": 73,
        "start_time": "2014-09-10T12:23:19",
        "total": 100
    },
    "status": "PROGRESS",
    "uuid": "f1b40709-ca76-4554-b19f-277b2f8d5d49"
}

UUID

The identifier of a task supplied by the user which was used to perform the initial status query.

Status key values (Service States)

Current status of a task that can be one of the following values:

  • PENDING
  • RECEIVED
  • STARTED
  • PROGRESS
  • STORING
  • FAILURE
  • SUCCESS
  • REVOKED
  • RETRY
  • EXPIRED

The states which are listed above are essentially the states reported by the underlying distributed processing queue system. In this case we use the Celery solution. Each status has a more in depth explanation in the result section below, but for a generic documentation about the reported states, one can also see the following document supplied by Celery:

http://celery.readthedocs.org/en/latest/reference/celery.states.html?highlight=states#misc

In addition to the Celery states, there are three custom states. The first is a custom state which is PROGRESS. This state means that the underlying service has updated a progress value that can be used to determine estimated time of completion for the given task. The other custom states that could be received are STORING and EXPIRED. See below for more information on possible states.

Result (variable)

A general variable that might hold different information depending on the aforementioned status value. For instance, when a processing request has concluded to an error state, information on the error will be reported in the result variable. Hence one must check the value of the status variable to know how to consume the result variable. The following states yield useful information in the result variable:

pending

The task has been submitted to a queue and is waiting to be processed by a worker. The time it may take before the processing starts depends on how many tasks have been previously submitted to the processing queue and how many workers are available to process this type of task. The worst case would be that there are no workers at all which are available to consume the given tasks at this time and thus the task may never be processed. The result for this status is always null:

{
    "result" : null
}
received

The task has been received by a worker. At this point we know that the task will be processed and a progress status should be available soon. There is still no result:

{
    "result" : null
}
started

The worker has started working on the task and a progress status should be available imminently. There is still no result:

{
    "result" : null
}
progress

The worker is doing some progress. The result variable will hold information about the progress of the task completion when in progress state. e.g.:

{
    "result" : {
       "worker_id_version": "0.1.43",
       "host": "david-transition.novalocal",
       "type": "transition",
       "start_time": "2014-09-10T12:23:19",
       "current": 12,
       "total": 100
    }
}

The key «current» documents the last reported progress state. «total» gives us the upper boundary of the progress scale. Thus in this case we are told that progress is at 12/100 (12%). The «start_time» can also be used to estimate the task remaining time : remaining_time = (now - start_time) * (total - current) / current. There is also some information on the worker like its «type», which the service name, the «host», which is where the worker is running and the «worker_id_version», which is the version of the worker.

storing

The worker is storing annotations on the annotation server. This state arises when the annotation service was called with instructions to save the annotations on an Annotations Storage Service back-end by issuing an annotations process request along with the «ann_doc_id» variable. In this context, the STORING state will be a transient state indicating that the call to the Annotations Storage Service is in effect and not yet complete. If the annotation process request was not issued with instructions to save to an Annotations Storage Service back-end then this state should not surface. The result structure is the same than the progress one except for the key «current» and «total» which are omitted:

{
    "result" : {
        "worker_id_version": "0.1.43",
        "host": "david-transition.novalocal",
        "type": "transition",
        "start_time": "2014-09-10T12:23:19"
    }
}
failure

The worker failed while processing the task. The result will give more details about the cause of failure. e.g.:

{
    "result" : {
        "code": 301,
        "message": "HTTP Error 404: Not Found"
    }
}

The keys «code» and «message» are the same as those used in the general exceptions handling and are documented in depth in the “Service exceptions” section at the bottom of this page.

success

The worker successfully completed the task. The result variable will hold the task output when in success state, which consists in an array of annotations. Each service will have a common property set for each annotation following by their specific properties since they have different outputs. This is what could be obtained:

{
    "result": [
        {
            "@id": "diarisation_annotation",
            "@version": "0.5.1",
            "specific_property": "A",
            "meaning_of_life": "Not sure"
        },
        {
            "@id": "diarisation_annotation",
            "@version": "0.5.1",
            "specific_property": "B",
            "meaning_of_life": 42
        }
    ]
}

These fields are common to all annotators:

@id:Indicates the worker type
@version:Indicates the worker version.
revoked

The task has been revoked which implies that the user cancelled the task through the REST interface. The result field is the same as for the Failure status, so it is possible to get more details on the revocation. The error code should always be 109, associated with the TaskRevokedError exception raised by a worker when its task is revoked. The message contains the revocation status. Among the possible values for the revocation status there is “revoked” which imply that the task has been revoked before any processing and “terminated” which means that the task had to be killed because it had already started. A revoked status with a result “terminated” should not be confused with a success status with a result structure : “terminated” means that the task has been killed and has nothing to do with the French word “terminée”. Example result:

{
    "result" : {
        "code": 109,
        "message": "terminated"
    }
}
retry

The worker failed while processing the task but has requested a new attempt to complete the task. The task has been re-submitted to a queue and should be picked up by again by another worker. By default, a delay of 180 seconds will be observed before starting the process again. The result field is the same as for the Failure status, so it is possible to know the cause of the failure which triggered a new processing attempt.

expired

This state is returned in the case where the queue has been idle for more than 2 hours and has been removed. The uuid is no longer useful once this state is declared since the task does not exist anymore. The result associated with this state is null:

{
    "result" : null
}

CANARIE API methods set

The CANARIE API is defined in the following document

It covers:

  • <Base URI>/info
  • <Base URI>/stats
  • <Base URI>/doc
  • <Base URI>/releasenotes
  • <Base URI>/support
  • <Base URI>/source
  • <Base URI>/tryme
  • <Base URI>/licence
  • <Base URI>/provenance

The base URI is specific for each service, please consult their respective documentation for more details.

Service Exceptions and error codes

When something goes wrong the system will return error responses which are documented here. Because service exceptions are handled in a uniform manner independently of the service that is being used, users can expect the same response format across the system. Because services target a computer use and not humans, the server will return exceptions in JSON format by default, unless that the ‘text/html’ format is explicitly requested via the ‘Accept’ header. In addition, to comply with CANARIE API and because these requests should be used by humans, they will, by default, return error in html format unless that the header mentions the ‘application/json’ format.

The JSON response takes over the response status and reason for clarity purpose and appends an error code and message under the Vesta key specific to the underlying system. e.g.:

{
    "status": 400,
    "description": "Bad request",
    "vesta": {
        "code": 206,
        "message": "A GET on the URI '/status' requires the following parameter : uuid"
    }
}

The keys «status» and «description» can take any values defined by the HTTP standard but the first table give an overview of the most frequent status that could be received. The key «Vesta» contains a structure composed of the keys «code» and «message» that give a specific information on the exception cause.

The next tables shows the various HTTP status codes that could be received and the following one lists all the internal error code and their explanations.

Status Description
200 Successful request, results follow
204 Request was properly formatted, no content
400 Bad request due to improper specifications, unrecognized parameter, parameter value out of range, etc.
404 The requested resource was not found
500 Internal server error
503 Service temporarily unavailable

Core libraries error codes

Code Description
100 The error doesn’t occur from the underlying system. The message will be empty.
101 Generic exception type. An unexpected exception type has been raised, the message could give more explanation on the cause.
102 Database manipulation exception.
103 Configuration files exception.
104 Operation or function is applied to an object of inappropriate type.
105 A built-in operation or function receives an argument that has the right type but an inappropriate value.
106 This exception is raised for address-related errors in the low-level networking interface.
107 An I/O operation fails for an I/O-related reason, e.g., “file not found” or “disk full”.
108 A mapping (dictionary) key is not found in the set of existing keys.
109 A task as been revoked.

REST services package error codes

Code Description
200 System settings loading exception.
201 One or many workers have a different REST API configured in their configuration files.
202 An unknown service is being used. Use the /info request to get available services on the current server.
203 An unknown task UUID is being used for a /status or a /cancel request.
204 The declared worker version in the server configuration file doesn’t match the one produced by the worker itself.
205 There is a problem in the communication with the AMQP server.
206 The request has been made without a required parameter.
207 A task request has been made without a valid document URL.

Worker Services library error codes

Code Description
300 Submitted annotations do not have a valid format.
301 An error occur while trying to download a document.
302 An error occur while trying to upload a document.
303 A required configuration file cannot be found.
304 Document cannot be found at given path.
305 Cannot use this document type.
306 Internal error of undetermined cause. See message.
4xx Exception codes are the ones coming from the Load Balancer package.
400 Resources are missing to complete a VM spawn.
401 The minimum number of VMs has been reached while tearing down a VM.

Transition and face detection services error codes

Code Description
600 An error occurred in the C library of the worker. The message will contain a worker specific error code.

Transcription, diarisation and text matching services error codes

Code Description
630 Diarization cannot resolve the path to the audio file.
631 Audio file format is not supported by diarization process. (WAV file parameters)
632 WAV file header has an unsupported structure for diarization process.
633 Internal error while forking diarization subprocesses.
64x are reserved for the Transcription service
640 Transcription worker cannot resolve path to the audio file.
641 Transcription worker encountered audio segments of greater length than it’s capacity.
642 Transcription worker encountered an internal error while forking internal subprocesses.
643 Transcription worker could not produce a transcription for a given document.