Common REST interface documentation¶
Purpose¶
This document describes the web service interfaces that are common for each VESTA service. It shows how a service can be used, the standard response types and how to handle exceptions.
Overview¶
Worker Services Features¶
- A worker receives his task directives from the REST Gateway through
Celery/AMQP
- Can add arbitrary arguments
- Resulting messages are temporarily stored on AMQP
- Optional directive to send resulting annotations directly on Annotations Storage System
- Process acquirement, start and stop times are encoded in resulting messages
- Worker process version string is communicated in resulting messages, permitting version and results coherence checking
- Ability to keep trace of all processed documents workflow
API specifications¶
The API of this service was constructed in a way to respect the requirements of the CANARIE API methods set specification for online services.
Calling Patterns¶
The services will be invoked using a subset of REST and HTTP methods. In particular, all services are called using the HTTP GET, POST and PUT methods.
Methods¶
There are 3 sets of methods available to access services. The first two are specific to their respective front ends which are the Load Balancer and the Multimedia File Storage and the third one is the set of methods required by CANARIE and thus is available on both front ends.
To cancel a given processing request
This method uses HTTP GET.
Parameters:
uuid: | The identifier of a previous processing request. |
---|
Return value:
Returns the status of the request after having submitted the cancel request. Once cancelled, any subsequent status requests will return state as revoked.
Examples:
URL form:
<Base URI>/cancel?uuid=6547137e-cc2f-4008-b1eb-4ae8e898ce83
Information route¶
You can obtain information on the current configuration of a deployed instance by issuing a HTTP GET at the /info route. This will give you information on the configured services with their expected version, list of route names and so on.
Parameters:
None.
Return value:
A JSON object with the list of configured routes and associated services such as:
{
"services": {
"transcoder": {
"category": "Data Storage",
"celery_queue_name": "transcoder_0.2.7",
"celery_task_name": "transcoder",
"doc": "http://some_server/doc.html",
"home": "http://some_server/doc.html",
"institution": "CRIM",
"licence": "http://some_server/doc.html",
"name": "Transcoder service",
"provenance": "http://some_server/doc.html",
"releaseTime": "2015-01-01T00:00:00Z",
"releasenotes": "http://some_server/doc.html",
"researchSubject": "Multimedia file transcoding",
"route_keyword": "transcoder",
"source": ",204",
"support": "http://some_server/doc.html",
"supportEmail": "support@company.com",
"synopsis": "RESTful service providing multimedia files transcoding.",
"tags": "multimedia,file,transcoding",
"tryme": "ssm/tryme.html",
"version": "0.2.7"
}
},
"version": "1.7.0"
}
Useful elements are:
version: | The version of the REST API. |
||||||
---|---|---|---|---|---|---|---|
services: | Every element in the list is an exposed service through a dynamic route. Most elements reflect the requirements of the CANARIE API specification.
|
Examples:
URL form:
<Base URI>/info
Status method¶
For methods requiring asynchronous tasks, there is a also a corresponding method that lets monitor the status of submitted tasks. The response format of this method is uniform across all services and contains 3 keys :
- uuid
- status
- result
For example:
{
"result": {
"worker_id_version": "0.1.0",
"current": 73,
"start_time": "2014-09-10T12:23:19",
"total": 100
},
"status": "PROGRESS",
"uuid": "f1b40709-ca76-4554-b19f-277b2f8d5d49"
}
UUID¶
The identifier of a task supplied by the user which was used to perform the initial status query.
Status key values (Service States)¶
Current status of a task that can be one of the following values:
- PENDING
- RECEIVED
- STARTED
- PROGRESS
- STORING
- FAILURE
- SUCCESS
- REVOKED
- RETRY
- EXPIRED
The states which are listed above are essentially the states reported by the underlying distributed processing queue system. In this case we use the Celery solution. Each status has a more in depth explanation in the result section below, but for a generic documentation about the reported states, one can also see the following document supplied by Celery:
http://celery.readthedocs.org/en/latest/reference/celery.states.html?highlight=states#misc
In addition to the Celery states, there are three custom states. The first is a custom state which is PROGRESS. This state means that the underlying service has updated a progress value that can be used to determine estimated time of completion for the given task. The other custom states that could be received are STORING and EXPIRED. See below for more information on possible states.
Result (variable)¶
A general variable that might hold different information depending on the aforementioned status value. For instance, when a processing request has concluded to an error state, information on the error will be reported in the result variable. Hence one must check the value of the status variable to know how to consume the result variable. The following states yield useful information in the result variable:
pending¶
The task has been submitted to a queue and is waiting to be processed by a worker. The time it may take before the processing starts depends on how many tasks have been previously submitted to the processing queue and how many workers are available to process this type of task. The worst case would be that there are no workers at all which are available to consume the given tasks at this time and thus the task may never be processed. The result for this status is always null:
{
"result" : null
}
received¶
The task has been received by a worker. At this point we know that the task will be processed and a progress status should be available soon. There is still no result:
{
"result" : null
}
started¶
The worker has started working on the task and a progress status should be available imminently. There is still no result:
{
"result" : null
}
progress¶
The worker is doing some progress. The result variable will hold information about the progress of the task completion when in progress state. e.g.:
{
"result" : {
"worker_id_version": "0.1.43",
"host": "david-transition.novalocal",
"type": "transition",
"start_time": "2014-09-10T12:23:19",
"current": 12,
"total": 100
}
}
The key «current» documents the last reported progress state. «total» gives us the upper boundary of the progress scale. Thus in this case we are told that progress is at 12/100 (12%). The «start_time» can also be used to estimate the task remaining time : remaining_time = (now - start_time) * (total - current) / current. There is also some information on the worker like its «type», which the service name, the «host», which is where the worker is running and the «worker_id_version», which is the version of the worker.
storing¶
The worker is storing annotations on the annotation server. This state arises when the annotation service was called with instructions to save the annotations on an Annotations Storage Service back-end by issuing an annotations process request along with the «ann_doc_id» variable. In this context, the STORING state will be a transient state indicating that the call to the Annotations Storage Service is in effect and not yet complete. If the annotation process request was not issued with instructions to save to an Annotations Storage Service back-end then this state should not surface. The result structure is the same than the progress one except for the key «current» and «total» which are omitted:
{
"result" : {
"worker_id_version": "0.1.43",
"host": "david-transition.novalocal",
"type": "transition",
"start_time": "2014-09-10T12:23:19"
}
}
failure¶
The worker failed while processing the task. The result will give more details about the cause of failure. e.g.:
{
"result" : {
"code": 301,
"message": "HTTP Error 404: Not Found"
}
}
The keys «code» and «message» are the same as those used in the general exceptions handling and are documented in depth in the “Service exceptions” section at the bottom of this page.
success¶
The worker successfully completed the task. The result variable will hold the task output when in success state, which consists in an array of annotations. Each service will have a common property set for each annotation following by their specific properties since they have different outputs. This is what could be obtained:
{
"result": [
{
"@id": "diarisation_annotation",
"@version": "0.5.1",
"specific_property": "A",
"meaning_of_life": "Not sure"
},
{
"@id": "diarisation_annotation",
"@version": "0.5.1",
"specific_property": "B",
"meaning_of_life": 42
}
]
}
These fields are common to all annotators:
@id: | Indicates the worker type |
---|---|
@version: | Indicates the worker version. |
revoked¶
The task has been revoked which implies that the user cancelled the task through the REST interface. The result field is the same as for the Failure status, so it is possible to get more details on the revocation. The error code should always be 109, associated with the TaskRevokedError exception raised by a worker when its task is revoked. The message contains the revocation status. Among the possible values for the revocation status there is “revoked” which imply that the task has been revoked before any processing and “terminated” which means that the task had to be killed because it had already started. A revoked status with a result “terminated” should not be confused with a success status with a result structure : “terminated” means that the task has been killed and has nothing to do with the French word “terminée”. Example result:
{
"result" : {
"code": 109,
"message": "terminated"
}
}
retry¶
The worker failed while processing the task but has requested a new attempt to complete the task. The task has been re-submitted to a queue and should be picked up by again by another worker. By default, a delay of 180 seconds will be observed before starting the process again. The result field is the same as for the Failure status, so it is possible to know the cause of the failure which triggered a new processing attempt.
expired¶
This state is returned in the case where the queue has been idle for more than 2 hours and has been removed. The uuid is no longer useful once this state is declared since the task does not exist anymore. The result associated with this state is null:
{
"result" : null
}
CANARIE API methods set¶
The CANARIE API is defined in the following document
It covers:
- <Base URI>/info
- <Base URI>/stats
- <Base URI>/doc
- <Base URI>/releasenotes
- <Base URI>/support
- <Base URI>/source
- <Base URI>/tryme
- <Base URI>/licence
- <Base URI>/provenance
The base URI is specific for each service, please consult their respective documentation for more details.
Service Exceptions and error codes¶
When something goes wrong the system will return error responses which are documented here. Because service exceptions are handled in a uniform manner independently of the service that is being used, users can expect the same response format across the system. Because services target a computer use and not humans, the server will return exceptions in JSON format by default, unless that the ‘text/html’ format is explicitly requested via the ‘Accept’ header. In addition, to comply with CANARIE API and because these requests should be used by humans, they will, by default, return error in html format unless that the header mentions the ‘application/json’ format.
The JSON response takes over the response status and reason for clarity purpose and appends an error code and message under the Vesta key specific to the underlying system. e.g.:
{
"status": 400,
"description": "Bad request",
"vesta": {
"code": 206,
"message": "A GET on the URI '/status' requires the following parameter : uuid"
}
}
The keys «status» and «description» can take any values defined by the HTTP standard but the first table give an overview of the most frequent status that could be received. The key «Vesta» contains a structure composed of the keys «code» and «message» that give a specific information on the exception cause.
The next tables shows the various HTTP status codes that could be received and the following one lists all the internal error code and their explanations.
Status | Description |
---|---|
200 | Successful request, results follow |
204 | Request was properly formatted, no content |
400 | Bad request due to improper specifications, unrecognized parameter, parameter value out of range, etc. |
404 | The requested resource was not found |
500 | Internal server error |
503 | Service temporarily unavailable |
Core libraries error codes¶
Code | Description |
---|---|
100 | The error doesn’t occur from the underlying system. The message will be empty. |
101 | Generic exception type. An unexpected exception type has been raised, the message could give more explanation on the cause. |
102 | Database manipulation exception. |
103 | Configuration files exception. |
104 | Operation or function is applied to an object of inappropriate type. |
105 | A built-in operation or function receives an argument that has the right type but an inappropriate value. |
106 | This exception is raised for address-related errors in the low-level networking interface. |
107 | An I/O operation fails for an I/O-related reason, e.g., “file not found” or “disk full”. |
108 | A mapping (dictionary) key is not found in the set of existing keys. |
109 | A task as been revoked. |
REST services package error codes¶
Code | Description |
---|---|
200 | System settings loading exception. |
201 | One or many workers have a different REST API configured in their configuration files. |
202 | An unknown service is being used. Use the /info request to get available services on the current server. |
203 | An unknown task UUID is being used for a /status or a /cancel request. |
204 | The declared worker version in the server configuration file doesn’t match the one produced by the worker itself. |
205 | There is a problem in the communication with the AMQP server. |
206 | The request has been made without a required parameter. |
207 | A task request has been made without a valid document URL. |
Worker Services library error codes¶
Code | Description |
---|---|
300 | Submitted annotations do not have a valid format. |
301 | An error occur while trying to download a document. |
302 | An error occur while trying to upload a document. |
303 | A required configuration file cannot be found. |
304 | Document cannot be found at given path. |
305 | Cannot use this document type. |
306 | Internal error of undetermined cause. See message. |
4xx | Exception codes are the ones coming from the Load Balancer package. |
400 | Resources are missing to complete a VM spawn. |
401 | The minimum number of VMs has been reached while tearing down a VM. |
Transition and face detection services error codes¶
Code | Description |
---|---|
600 | An error occurred in the C library of the worker. The message will contain a worker specific error code. |
Transcription, diarisation and text matching services error codes¶
Code | Description |
---|---|
630 | Diarization cannot resolve the path to the audio file. |
631 | Audio file format is not supported by diarization process. (WAV file parameters) |
632 | WAV file header has an unsupported structure for diarization process. |
633 | Internal error while forking diarization subprocesses. |
64x | are reserved for the Transcription service |
640 | Transcription worker cannot resolve path to the audio file. |
641 | Transcription worker encountered audio segments of greater length than it’s capacity. |
642 | Transcription worker encountered an internal error while forking internal subprocesses. |
643 | Transcription worker could not produce a transcription for a given document. |