Transcoding Service User’s guide


This document describes the WEB service interfaces for the transcoding service.

Service Description

This service allows for the reencoding of multimedia documents according to some specification. Such a specification is supplied as a JSON structure and typically contains a task to be carried out (preserve audio-video, strip audio, strip video, take a thumbnail image), some new audio/video parameters, editing commands, predefined specifications, etc. The output is a new multimedia document or a thumbnail image.

JSON specification structure

The top level of the JSON structure is as follows:

   "task": ...,
   "format": ...,
   "preset": ...,
   "thumbnail_timecode": ...,
   "thumbnail_size": ...,
   "start": ...,
   "end": ...,
   "duration": ...,
   "startpts": ...,
   "endpts": ...,
   "videoparams": ...,
   "audioparams": ...
Specifies the transcoding task to be carried out. Many tasks can be requested:
  • “VideoOnly” (transcode video data and ignore/strip audio)
  • “AudioOnly” (transcode audio data, ignore/strip video)
  • “AudioVideo” (transcode both audio and video)
  • “Thumbnail” (generate a thumbnail image)
  • “Edit” (edit the document).
  • “Probe” (request information about the document)

A “probe” task does not perform any transcoding: it returns a structure with a single field ‘probe’ that looks like:

{ "probe": "Format: mov,mp4,m4a,3gp,3g2,mj2. Duration: 5.568.Video: 560x320, codec=h264, bitrate=465641, 30.0 fps ; audio codec=aac, 1 channel(s), sampling rate=48000.0"   }

If the selected task is “Thumbnail”, a video frame is extracted at the temporal location specified by the element “thumbnail_timecode” and saved in JPEG format. Image size can be modified with the property “thumbnail_size”. Timecode format is either HH:MM:SS, MM:SS, SS, HHhMMmSS or MMmSS.


{"task":"thumbnail", "thumbnail_timecode":"1m20", "dest": {"localpath":"/data"}}

{"task":"thumbnail", "thumbnail_timecode":"1:20", "thumbnail_size":"50x50", "dest": {"localpath":"/data"}}

If the selected task is “Edit”, the service chops the multimedia document into slices that will be stitched together to form the output document. Two lists are expected: “startpts” and “endpts” hold the timecodes at the beginning and end of each selected slice. Timecode format is hh:mm:ss.msec.
Allows the user to select a “high-level” format without having to pick specific audio and video codecs. For example, choosing “mp4” encodes a file using h264 as video codec and aac as audio codec. Most common formats: mp4, webm (video codec=vp8 or vp9, audio codec=vorbis or opus), mpeg (mpeg2video/mp2), flv (flv1/mp3), avi (mpeg4/mp3), mov (h264/aac), matroska (h264/vorbis), ogg (theora/vorbis).
Presets are groups of parameter settings that are conveniently hard-coded inside the service. For now, two presets are proposed:
  • “vesta-sd” corresponds to the settings (vocdec=flv1, acodec=aac, vbitrate=1200kbps, abitrate=192kbps, asamplerate=44100)
  • “vesta-hd” corresponds to the settings (vocdec=h264, acodec=aac, vbitrate=5000kbps, abitrate=192kbps, asamplerate=44100)
Required for the task Thumbnail
Optional for the task Thumbnail: if no size is given, the generated thumbnail will have the size of the corresponding video frame.
Used for trimming the input file. Along with the elements “end” or “duration”, it limits transcoding to a portion of the input file. For example, if “start”: “1:12” and “end”: “3:20” are given, the final document will contain the part of the input file between timecodes 1min 12sec and 3min 20sec.
See “start” above.
See “start” above.
Used in conjunction with task “Edit”. For example, if “startpts”: [“00:00:15”,”00:00:50”] and “endpts”: [“00:00:25”,”00:01:00”], the resulting file is the concatenation of the video segments between timecodes (15sec, 25sec) and timecodes (50sec, 1min). Supported formats: HH:MM:SS, MM::SS, SS, HH:MM:SS.u, MM::SS.u, SS.u where u is a fraction of a second.
See “startpts” above.
A JSON object that allows the user to tweak video reencoding. Its optional fields are: - “codec”: video codec to be used for reencoding, provided that it is known by the service. It can possibly override the default codec if a format was also specified. For example, using format “avi” (default codec is mpeg4) in combination with “codec”:”h264” is accepted. Note that some formats have strict specifications: “ogg” only supports the video codec “theora” and the audio codec “vorbis”, so no override is possible in this case. - “bitrate”: level of compression. The higher the bitrate, the better the video quality. Units are in kilobits per second (e.g. 200k or 200 where k is by default) or megabits per second (e.g. 2M or 2000k or 2000) - “framerate”: new frame rate between 1 and 100 frames per second. Video frame rate is normally 24 (movies) or 30 (TV). - “deinterlace”: convert an interlaced video file into a non-interlaced form (progressive scan). Set “deinterlace” to true to activate. - “size”: change the video frame size during transcoding. The syntax is as follows: “size”: [new_width, new_height]. - “aspect”: change the video aspect ratio. Can be a numerical value (“aspect”: 1.5) or one of the two allowed strings “4:3” and “16:9”. - “crop”: captures a region in the video frames of the input file. Syntax as follows: “crop”:[x,y,width,height] where all parameters are expressed as a function of the image size. For example, “crop”:[0.4,0.35,0.3,0.4] tells the transcoder to save the central portion of the input video, between columns corresponding to 40% and 70% (i.e. 40% + 30%) of the image width and between rows corresponding to 35% and 75% (i.e. 35% + 40%) of the image height.
A JSON object that allows the user to tweak audio reencoding. Its optional fields are: - “codec”: audio codec to be used for reencofing, provided that it is known by the service. See “videoparams” for more details about codec override. - “bitrate”: level of compression. The higher the bitrate, the better the audio quality. Units are in kilobits per second (e.g. 128k). - “samplerate”: number of samples per second (between 1 and 100000).
A JSON object that specifies where to copy the output document. It should contain one of these fields:
  • “username”, “password”: in case authentication is required for ftp transfer.

  • “mss_url”: a URL pointing to Vesta’s multimedia server

  • “url”: a URL pointing to an FTP or HTTP server.

    • In the case of ftp: the url should point to a directory, not a filename.
    • In the case of http: the transcoding service will try to POST the resulting file as a multipart-encoded file.

Example 1 - FTP:

   "task": "VideoOnly",
   "videoparams": {
      "codec": "h264",
      "bitrate": "1000k"
   "dest": {
      "url": "",
      "username": "ftpvisi",
      "password": "password"

Example 2 - HTTP

   "task": "VideoOnly",
   "videoparams": {
       "codec": "h264",
       "bitrate": "100k"
   "dest": {
       "url": ""
  • “localpath”: a path on the filesystem where the service is deployed. For a CRIM deployment, “localpath” could be “/misc/tmp” for the T:drive. The allowed paths are determined during service deployment (docker volume). Example:
   "task": "VideoOnly",
   "videoparams": {
       "codec": "h264",
       "bitrate": "1000k"
   "dest": {
       "localpath": "/data"

Common REST interface documentation

As the processing can be CPU-intensive, the service itself can be horizontally scaled in order to process a large number of incoming requests. The horizontal scaling is done behind a unified REST interface. This is the recommended way to exploit this Service. Considering that this Service then resides behind another Service named “Service Gateway”, the interface with which any user is thus defined by the latter.

The interface of the Service Gateway is documented in a separate documentation and you can refer to that documentation, here: common_rest_interface.

Supported Service Gateway versions

  • vestaservices/servicegateway:1.8.2

Document format restrictions

Many multimedial file formats/codecs are supported by the service. The complete list can be obtained by sending a request to the route “/xcodingcap/process”. The output structure contains a uuid that can be passed as a parameter to the status request (i.e. “/xcodingcap/status?uuid=xxx”) which returns a JSON structure holding the supported audio/video codecs as well as formats and presets, e.g.:

   "result": {
     "audio_codecs": " copy vorbis aac mp3 mp2 libfdk_aac ac3 dts flac pcm16 ",
     "formats": " ogg avi mkv webm flv mov mp4 mpg mp3 wav ",
     "presets": " vesta-hd vesta-sd ",
     "video_codecs": " copy theora h264 divx vp8 h263 flv mpeg1 mpeg2 mpeg4 "
   "status": "SUCCESS",
   "uuid": "71cedf8f-2bba-4380-8200-96bc572e00f6"

Successful results query response

When a transcoding query has reached the success state the result variable of a status request will hold information related to the requested task.

  • if the task is “probe”, the result variable will contain a “probe” field, i.e. a text string that is a concatenation of many pieces of information about the file namely format, duration, video frame size, video codec, video bitrate, frame rate, audio codec, number of audio channels, audio sampling rate. For example:
"result": {
     "probe": "Format: mov,mp4,m4a,3gp,3g2,mj2. Duration: 5.568.Video: 560x320, codec=h264, bitrate=465641, 30.0 fps ; audio codec=aac, 1 channel(s), sampling rate=48000.0  "
  • otherwise, the result variable will hold the destination of the reencoded multimedia document, document length in seconds and the task that was requested. For example:
"result": {
    "dest": "",
    "length": "5.568",
    "task": "videoonly"