Build with an Azure free account. Get USD200 credit for 30 days and 12 months of free services.

Start free today

Event-based data integration with Azure Data Factory

Sign in to queue

Description

Gaurav Malhotra joins Scott Hanselman to discuss Event-driven architecture (EDA), which is a common data integration pattern that involves production, detection, consumption, and reaction to events. Learn how you can do event-based data integration using Azure Data Factory.

For more information:

Embed

Download

The Discussion

  • User profile image
    JohnN

    Thank you for this demo, and to the ADF team for adding this much-needed functionality.

    But even in your demo it shows how slow Data Factory can be. At 7:32 in the video the stats show that 355 bytes took 44s to transfer!!??

    Is there any way (as azure Admins) we can see a more granular breakdown as to what that 44s was spent doing? How can we speed this up for real-world data loads?

  • User profile image
    gauravmalhot

    Yes, you can see the granular breakdown using our visual tools by doing the following steps:

    a. Click 'Monitor' tab on left bar

    b. Identify the pipeline run for the copy operation

    c. Click on the 'View Activity Runs' icon under the pipeline run

    d. Click on 'Details' icon under the 'Actions' column for your activity run

    e. You can see how much time was spent in queue, in actual transfer, throughput, duration etc.

    The time taken is not linear with the amount of data being transferred. You can transfer data from blob to SQL DW which was the case that I was showcasing at a throughput of 1 GB/s meaning you can transfer 1 TB worth of data in close to 20-25 mins. Please give it a try and let us know if you see any perf issues.

  • User profile image
    aljj

    Another cool feature added.

    Questions:

    1) is there a way to trigger an activity based on more than 1 event? Say I want to start a Databricks Notebook only after 2 files arrived

    2) I'd like to trigger when a parquet table/folder is completed this is when _committed_<guid> file is created in a folder. Is there a way to specify wildcards? Or how can I do that?

    Thanks and keep adding features,

    a.

     

     

  • User profile image
    Cesar Hernandez

    1. Today we do not have a batching mechanism for events that would enable that scenario.

    2. It is not possible to specify wild cards directly in the filters.

    However for 2, if the container and folder names are well known and only this type of file will be created there than just omit the file name in the starts with field like so: '/containername/blobs/folderpath/'.

    If various types of files will be uploaded to the same folder but you only want the trigger to fire for a particular kind of file you have two options.
    First is to simply include the part of the file name that is known like so: '/cotainername/blobs/folderpath/_committed_'.
    Second option is to use a combination of starts with and ends with, so for example starts with could be '/containername/blobs/folderpath/' and ends with could be '.csv'. This last approach would only fire the trigger for .csv files in the location specified.

  • User profile image
    MADHU

    Azure data factory trigger EVENT type will support blob storage or it will support for data lake store Gen1.

    If it will support for data lakes store files also please provide steps.

    please Can any one provide information.

    Advance thanks.

  • User profile image
    robcaron
    ADLS Gen1 does not emit events today. Once they begin to emit events, Data Factory plans on supporting them as a source.

Add Your 2 Cents