Actionable Data Analytics
Join Our Email List for Data News Sent to Your Inbox

Azure Data Factory Data Consistency Verification

Have you ever created a data movement design that had to verify data integrity of a file or table using checksums? Did you need to or make sure that all the rows were loaded when working with tables? This is a common request when designing data movement frameworks. Azure Data Factory makes it easier for us because it has a built-in Data Consistency Verification that you can use. 

In this post, you will learn about Data Consistency Verification and how to enable and monitor it. 

What is Data Consistency Verification in Azure Data Factory? 

To begin, data consistency verification performs the following validations for you: 

  • When copying tables, it validates that the number of rows in the source system is the same as the destination system 
  • When copying files, it validates file size, last modified date and MD5 checksum 

However, please take into account the current limitations. You can check them here. Also, be aware that it will affect the performance of the copy activity because it needs to perform a few more checks. 

The states to know about this verification: 

  • Verified: the verification has been performed correctly between the source and destination system. If the verification detects differences and you have not enabled Fault tolerance, the copy activity will fail.  
  • NotVerifiedthe copy activity has not been verified to be consistent because you have not enabled the validateDataConsistency in copy activity. 
  • Unsupported: the verification cannot be performed because it is not supported by the current configuration (source or destination system). Enable Azure Data Factory Data Consistency Verification 

First, enable the feature within the copy activity, in the settings sections. 

Data Consistency Verification

Is that all? Yes, you don’t need to configure anything else. You can also use the JSON definition in the copy activity section to enable it.  

Monitor Azure Data Factory Data Consistency Verification 

Finally, after triggering the pipeline, you can see additional information in your pipeline output logs. You’ll also get error information if you enable Copy Activity Logs. To know more about Copy Activity logs, check this blog post on Azure Data Factory Copy Activity Logs

The output in the summary. 

The output in JSON format. 

Summary 

In summary, today you’ve learned how to enable data consistency verification within the copy activity with only one click and without writing a single line of code. If you have any questions, please leave a comment below. 

What’s Next?     

In upcoming blog posts, we’ll continue to explore Azure Data Service features.         

Please  follow Tech Talk Corner on Twitter for blog updates, virtual presentations, and more!               

Check out these other blog posts

comment [ 1 ]
share
No tags 0
1 Response
  • Mike
    05 . 08 . 2022

    “Copy Data” activity, Consistency Verification
    I want to see the results of a failed verification
    I couldn’t get it to fail
    I selected a source SQL table that would take about a minute to copy
    I truncated the sink table
    I started the pipeline
    While the copy was in progress, I used a query window to delete some rows
    The query window’s results showed that it succeeded in deleting rows
    When the pipeline finished, the copy activity showed success & that verification was good
    I suspect that the copying was done in a transaction & the final row count test was still inside that transaction

    Questions
    * How do I force a failure for testing?
    * If the copying completes, but verification fails, does execution continue on the SUCCESS path or switch to the FAIL path?
    I’d expect it to switch to the FAIL path, but I can’t find that explicitly in the documentation and I can’t find a way to test it

    Thank you

Do you want to leave a comment?

Your email address will not be published. Required fields are marked *

For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

I agree to these terms.