Build with an Azure free account. Get USD200 credit for 30 days and 12 months of free services.

Start free today

Event-based data integration with Azure Data Factory

Sign in to queue

The Discussion

  • User profile image
    JohnN

    Thank you for this demo, and to the ADF team for adding this much-needed functionality.

    But even in your demo it shows how slow Data Factory can be. At 7:32 in the video the stats show that 355 bytes took 44s to transfer!!??

    Is there any way (as azure Admins) we can see a more granular breakdown as to what that 44s was spent doing? How can we speed this up for real-world data loads?

  • User profile image
    gauravmalhot

    Yes, you can see the granular breakdown using our visual tools by doing the following steps:

    a. Click 'Monitor' tab on left bar

    b. Identify the pipeline run for the copy operation

    c. Click on the 'View Activity Runs' icon under the pipeline run

    d. Click on 'Details' icon under the 'Actions' column for your activity run

    e. You can see how much time was spent in queue, in actual transfer, throughput, duration etc.

    The time taken is not linear with the amount of data being transferred. You can transfer data from blob to SQL DW which was the case that I was showcasing at a throughput of 1 GB/s meaning you can transfer 1 TB worth of data in close to 20-25 mins. Please give it a try and let us know if you see any perf issues.

  • User profile image
    aljj

    Another cool feature added.

    Questions:

    1) is there a way to trigger an activity based on more than 1 event? Say I want to start a Databricks Notebook only after 2 files arrived

    2) I'd like to trigger when a parquet table/folder is completed this is when _committed_<guid> file is created in a folder. Is there a way to specify wildcards? Or how can I do that?

    Thanks and keep adding features,

    a.

     

     

  • User profile image
    Cesar Hernandez

    1. Today we do not have a batching mechanism for events that would enable that scenario.

    2. It is not possible to specify wild cards directly in the filters.

    However for 2, if the container and folder names are well known and only this type of file will be created there than just omit the file name in the starts with field like so: '/containername/blobs/folderpath/'.

    If various types of files will be uploaded to the same folder but you only want the trigger to fire for a particular kind of file you have two options.
    First is to simply include the part of the file name that is known like so: '/cotainername/blobs/folderpath/_committed_'.
    Second option is to use a combination of starts with and ends with, so for example starts with could be '/containername/blobs/folderpath/' and ends with could be '.csv'. This last approach would only fire the trigger for .csv files in the location specified.

Add Your 2 Cents