Data Orchestration for a Hearing Aid Company Looking to Transfer Large Files

Client faced a challenge in transferring large files manually after data processing operations from the distributed compute nodes. Also, multiple files are received in response from the integration point.

 
Category: Data Science

Overview

The client is a hearing aid company consisting of a network of franchised and corporately owned retail outlets. The company provides a hearing solution for over 70 years, with 1,500 franchised locations nationwide, and headquarters in Minneapolis, Minnesota, US.

Challenges

Client faced a challenge in transferring large files manually after data processing operations from the distributed compute nodes. Also, multiple files are received in response from the integration point. Each file needs to be downloaded and processed to make sense of the response. The current azure service component has limitations and is unable to write the file on the SFTP location as well as the external SFTP does not provide any trigger or callback to current service components when the response files are ready for data transfer. Continuous polling of SFTP is cost-prohibitive and faulty.

SolutionData Orchestration To Transfer Large Files

We leveraged the diverse components available on the Azure eco-system to weave a scalable and fault-proof solution using Azure Data Factory orchestration pipelines to transfer the CSV file to the IMDATA FTP location from the Azure blob storage via Azure Logic App, and then after processing the file, transfer the output file to Azure blob storage. The task was split into distinct phases:

  • A dynamic parametrized logic app to transfer the source files to integration point SFTP  which is called by the current orchestration pipeline as the next step after distributed node computation exercise completes.​
  • A separate orchestration pipeline using Azure data factory where a logic app is called to check for the response files sent by the integration server at the SFTP output location. This step has a smart retry poling policy where it polls the SFTP location for response files after an initial interval it determines based on a machine learning model trained on previous file sizes and response times, and then builds an exponential retry till it either detects files or raises an alert that response file has not been received beyond a reasonable time limit.​
  • Finally, files are copied from the blob location using Azure data factory service.​

Benefits

  • The Data Orchestration To Transfer Large Files component is fully dynamic and parameterized, which can be used in any pipeline to transfer data from any blob location to the FTP location.
  • Ability to send and receive files with smart polling without any callback or hook
  • Human efforts have been minimized as much as possible saving the overall cost of operation
  • Continuous polling is not required therefore saving on the cost of polling. 

About Contata

Contata Solutions is a trusted leader in technology and digital innovation. Through our work in data engineering, data analytics, machine learning, marketing automation and app development, we deliver solutions that address complex problems in ways that are simple, insightful and impactful.

Our promise and value proposition to our customers is simple: we leverage our deep technical expertise and global presence to bring software products and data-driven decision capabilities to life.

Founded in 2000, Contata is a privately-held company headquartered in Minneapolis that serves clients globally from offices in the United States and India.

Download

Interested to know more? Get in touch!