Face detection Service User’s guide


This document describes the WEB service interfaces for the face analysis service.

Service Description

Face analysis is the process of extracting face tracks from a video where a face track is a collection of face positions of a single person in the video for a certain time range. A single track contains properties on the face like its orientation or pose (e.g. frontal, profile, etc.); every time the service picks up a new face or when the face being tracked through the video changes its orientation, a new track is created with a unique id. Depending on the ability of the face tracker or the variations of face poses inside a video, it may happen that a single person appearing in the video is associated with multiple tracks.

Common REST interface documentation

As the processing can be CPU-intensive, the service itself can be horizontally scaled in order to process a large number of incoming requests. The horizontal scaling is done behind a unified REST interface. This is the recommended way to exploit this Service. Considering that this Service then resides behind another Service named “Service Gateway”, the interface with which any user is thus defined by the latter.

The interface of the Service Gateway is documented in a separate documentation and you can refer to that documentation, here: http://services.vesta.crim.ca/docs/sg/latest/user_guide.html#common-rest-interface.

Supported Service Gateway versions

  • 1.5
  • 1.6
  • 1.7
  • 1.8

Document format restrictions

There are no restrictions on the document format per se. The minimum size of faces inside a video document should be high enough to facilitate detection (at least 80x80 pixels).

Successful results query response

When a query has reached the success state the result variable of a status request will hold the annotation result. Furthermore, the annotations can be stored on a Annotations Storage Service if the Annotations Storage Service document UUID was supplied and the Annotations Storage Service URL was configured in the LoadBalancer front end’s configuration. (See Services common REST interface).

The annotations which are rendered in the status request result have the following form:

    "@context": "http://www.crim.ca/schema/face/",
    "@type": "face_track",
    "begin": 78,
    "confidence": 0.5,
    "degreeOfActivity": -1,
    "end": 101,
    "faceId": 0,
    "frameRate": 29.97002997002997,
    "keyBoundingBoxes": [
            "frameNo": 78,
            "timePoint": "npt:2.6026",
            "xywh": "percent:0.45652173913,0.279166666667,0.282608695652,0.216666666667"
            "frameNo": 85,
            "timePoint": "npt:2.83616666667",
            "xywh": "percent:0.45652173913,0.302083333333,0.282608695652,0.216666666667"
            "frameNo": 91,
            "timePoint": "npt:3.03636666667",
            "xywh": "percent:0.45652173913,0.254166666667,0.282608695652,0.216666666667"
            "frameNo": 92,
            "timePoint": "npt:3.06973333333",
            "xywh": "percent:0.45652173913,0.279166666667,0.282608695652,0.216666666667"
            "frameNo": 93,
            "timePoint": "npt:3.1031",
            "xywh": "percent:0.45652173913,0.254166666667,0.282608695652,0.216666666667"
            "frameNo": 101,
            "timePoint": "npt:3.37003333333",
            "xywh": "percent:0.45652173913,0.254166666667,0.282608695652,0.216666666667"
    "nbKeyBoundingBoxes": 6,
    "poseType": 0,
    "poseX": 0,
    "poseY": 0,
    "t": "npt:[2.6026,3.37003333333)",
    "version": "1.0.0"

Which is a list in which every instance is a face track produced by the face analysis process. Useful keys are the following:


An URL where the JSON-LD schema can be found


The type of representation


Version of the service


The track start and end point expressed in seconds using real numbers


A real value denoting the number of frames per second of a video


The track starting video frame (inclusive)


The track ending video frame (inclusive)


Track identifier, where a track represents the appearance of a face (expected to be from the same person) with a given pose (e.g. frontal) inside a block of contiguous video frames.


Reserved for future use (always -1)


Property of a track indicating the orientation of the face of the subject (Possible values are in range [-2-2] : 0=frontal, -1=left profile, 1=right profile, -2=down, 2=up)


Not used (always 0)


Not used (always 0)


Average confidence of all the face detections composing the track.


The number of key bounding boxes contained in the array pKeyBoundingBoxes.


An array of key bounding boxes with a time code associated to each bounding box. Key bounding boxes represent face positions inside video frames and they are selected as follows: the face position at any given time code can be linearly interpolated based on the closest surrounding key bounding boxes. The track will always contains at least a starting and an ending bounding box, or only one key bounding box in case of a track of length 1.


The frame number of the key bounding box


Same as frameNo but expressed in seconds using real number.


A comma separated string where the x, y, width and height are given in percent relative top the width and height of the video.

x:Horizontal position of the bounding box center (0% being at left completely)
y:Vertical position of the bounding box center (0% being completely on top)
height:Height of the bounding box in percent
width:Width of the bounding box in percent