Suruna and Microsoft work together to get intelligent metadata from any video using Azure & Cognitive Services

Sign in to queue


Suruna and Microsoft built together an intelligent media processor that leverages technologies from Microsoft's cloud to make media files and content searchable by exposing this meaningful metadata in an automated process.


"Already comprising most of the global Internet traffic, video content is definitely taking over the World Wide Web, and out of this growth arises the problem of discovering content" ~Arturo Calle, CEO at Suruna

Internet was designed around text-based documents, and as such, has mature infrastructure to encourage and enable the search and discovery of text across the entire web. Video files, on the other hand, are not natively "searchable", and usually require complex classification systems primarily powered by massive amounts of manually-tagged metadata.

Suruna develops a personalized video programming system that allows publishers and content owners to generate more video views and increase user engagement. Suruna's solution uses Artificial Intelligence and Machine Learning to understand video content and to create patterns of user behavior. Suruna also uses its own designed AI algorithm based on BDI models to create video clusters and give video recommendations. Suruna creates customized ontologies for each video dataset allowing a powerful advanced search into the video content. Because of the complexity and amount of information in the video assets and the numerous tasks required for Artificial Intelligence process, the development and implementation must be highly modular and use enterprise-grade cloud computing services. In the market, there are several solutions that are not standardized or natively designed for the cloud and can generate unpredictable results.

Suruna decided to use Microsoft Azure to automate the extraction of speech to text characteristics and then process these results using Azure Cognitive Services. For the elastic and modular design of the architecture it was decided to use Azure Functions. The result can be processed by our classification algorithm. Creating a predictable and elastic process. The design will be used for the first time and as an example, to process video of the Odebrecht case, Car Wash.

Generic Image

Key Technologies

Microsoft Cloud

Tools & Languages

External App/Service

  • Alterlatina Online Video Platform

Partner Profile

Suruna is an Independent Software Vendor (ISV) specialized in Artificial Intelligence (AI) for the media industry. They offer an intelligent Application Programming Interface (API) that turns any online video platform in a site with Netflix-like features to increase engagement (video suggestions, advanced search, analytics, etc.). Suruna is actually a spin-off from Alterlatina, a pioneer streaming media company in Peru, funded in 1999.

Suruna is product of a reflection about how online video platforms can deliver the right videos at the right time. The answer was the use of AI algorithms. They started to develop their own AI algorithms for video recommendation in early 2010's. In 2014, with their first product version, they were accepted in Start-Up Chile incubation program (Generation 11th); with this program, they went to Stanford University and UC Berkeley to know the state of the art in AI. In 2015, they were selected by Wayra (Telefonica's corporate accelerator) and also won a contest in Peru to represent our country in APEC Global Challenge in Taipei, expanding their connections with Asia ecosystem. In 2016, Suruna has stablished a partnership with the Computer Science School of National University of Engineering (UNI) and they keep growing their business.

Solution overview

To have a frame of reference we used the Odebrecht case (Car Wash) with the goal of creating a useful tool for investigative journalists. However, the design of the solution aims to be able to massively process any group of videos.

Working side-by-side with Microsoft, we were able to get a closer look at the capabilities of its Machine Learning services for audio indexing and the most recent Cognitive Services, Queue Services and design of elastic serverless computing based on Functions. At the end, we found more applications for Suruna than we initially thought.

Finally, we designed an architecture capable of supporting the future needs of Suruna, where an average customer has 3k videos that can be accessed 1.5M times a month. For which a design on Virtual Machines would not be efficient. In addition, design allows the integration of cognitive services as needed. The result is as follows:

Generic Image

Generic Image

Generic Image


Technical delivery

Suruna used Azure Media Services to extract basic characteristics and then process it with Azure Cognitive Services to get useful information to classify and recommend videos. For the massive and repetitive process, we used Azure Functions to automate it.


Importing the video & getting text from audio

For this example, the video source was an Azure Blob Storage, where videos were originally located. For purposes to process those videos, we built a "HTTP triggered function" to import and locate them in another Blob Storage controlled by us (Temporary Repository). We used a JSON definition to call to this function:

Generic Image

If all of parameters sent in the HTTP POST message were correct then we save temporally all videos from the source to Suruna Azure Blob Storage Account to begin with the core flow. Just after received a success in the import operation given by Suruna, a confirmation message is needed to begin the process, and this is made by another call:

Generic Image

Here we specify the language of the indexing process. This message triggers a indexing task for every video in a temporary storage container.

Working with Cognitive Services

After all videos have been processed by Suruna, the results of this process (video captions) are published in an Azure Search Index ("-captions"). Furthermore, we process these captions extracted in format WebVTT and consume a Text Analytics Service from Azure to extract the key phrases for these captions and then we publish these key phrases in an Azure Search Index ("-key-phrases"). The API returns a list of strings denoting the key talking points in the input text (closed captons of the video).

Notice that, Language is an optional parameter of Text Analytics API that should be specified if analyzing non-English text (in this case, Spanish). Here you can find the full API Reference for more details:

After this step, the user is able to request the captions (search words in captions) and key phrases through a HTTP Request to a customized URL, for instance for captions:


Generic Image

We obtain:

Generic Image


As an example of how to use SURUNA API, we built a video platform to access and show those results.

Searching words in captions of a video:

Generic Image


Obtaining the key phrases given for a video:

Generic Image

Searching words through all the videos of the platform:

Generic Image


"Definitely, working with the Microsoft team was an excellent experience for Suruna" ~ Arturo Calle, CEO at Suruna

This experience enabled us to discover new tools for implementing solutions using Machine Learning in the Cloud, not only for data processing but also for the design of an efficient architecture. We have been able to standardize the process of extraction features, we have implemented cognitive services to identify keywords and concepts within the video in an architecture that allows us to add more cognitive services in a simple way. Granting much more added value to Suruna.

Next Steps

This solution has been applied, in partnership with the Peruvian Society of Professional Journalists, on the "Operation Car Wash" case (Odebrecht corruption scandal) to give the entire Latin-American group of journalists a tool for investigation. They are able to search relevant data inside a set of more of thousands videos from main Broadcast Networks and Digital Newspapers in LATAM.

As part of Suruna's expansion strategy, its market for the next year are the top media sites in 10 countries of Latin America. That means potential sales of +5M USD and reach 250MM of unique visitors using Suruna monthly. During the next years, they also plan to expand to other industries like education and government.

Additional resources

Project Team


  • Carlos J. Rojas Reyes (@karlitoz007) – Cloud Solution Architect
  • JL Revilla (@jlrevilla) – Data & AI SPP
  • Eduardo Mangarelli (@emangare) – TE Lead


  • Arturo Calle Flores (@arturocalle) – CEO 
  • Jose Valenzuela – Lead AI Developer
  • Mario Zarate – Developer

The Discussion

Add Your 2 Cents