How Real-Time Data Streaming can elevate ETL to new heights


These days, data is king. Data now upholds decision making across all departments of the business, and as organisations ‘become software’, it is used to automate every process that will benefit them. However, as the use of data proliferates, the complexity of managing that data only increases, with an ever-growing network of fragmented data sources to cope with.

A recent IDC survey concluded that the most pressing barrier to digital transformation in 2022 was data fragmentation, finding that 79% of organisations use 100+ data sources, with 30% using 1000+ sources. The survey also found that Chief Data Officers (CDOs) spent 35% of their time on day-to-day data management tasks rather than driving innovation for their business. In a business world where enterprises with strong data maturity generate 250% more business value, it is evident that efficient data cleansing, enrichment and processing is critical.

For an organisation to become data-mature, strong data leadership is required to address the difficult issue of fragmented data. Today, real-time data is essential for any business looking to utilise AI and machine learning (ML), both key drivers that will accelerate their digital transformation journey and improve their offering. The key to business growth and modernisation therefore is the ability to collate this fragmented data into an accessible form, one which can then be analysed and harnessed. Traditional ETL (Extract, Transform and Load) has played a critical role here.

Data in motion – replacing traditional ETL

ETL is a three-phase process used to gather data from multiple sources into a centralised database. To break the term down, data is taken from its original sources (extract), cleaned and prepared into a usable form (transform), and stored into the desired location (loaded).

ETL is not new, with its roots back in the ‘70s, when the process was sequential, data systems were far more simplistic and use cases were mostly for analytical data. Traditionally, ETL was used in batch and didn’t always provide optimal results, leading to some solutions changing to ELT – Extract, Load and Transform – and even reverse ETL – to get analytical data back into the operational world. Now, with real-time data in motion driving both customer expectation and backend operations, the industry is adopting data streaming as the new solution.

Unlike with batch processed ETL, with data streaming data can be automatically extracted and processed (transformed), ready to automatically trigger other software applications, or be loaded for analytical use cases. This streamlining is via a ‘write-to’ and ‘read-from’ a log, where data (or events) can persist for as long or little a time as required. Software can ‘talk’ to software directly via the streaming platform. This enables new levels of automation with enhanced scalability and security, also supported on cloud infrastructure.

Data Streaming already supports major industries

For businesses to quickly respond to ever-shifting market conditions, they must utilise the valuable flow of real-time data. Instead of having static databases, data streaming allows actions and analysis to be triggered in real time by the data itself, opening new doors for value creation that had been impossible with old-fashioned request-response architecture. Uber, Netflix, eBay and Yelp are just a few of the major technology brands that have already adopted this real-time approach, building their systems around data streaming.

More traditional industries also now rely on real-time stream processing. Major financial services businesses are in a position where consumers now expect features brought to popularity by smaller challenger banks, such as in-app budgeting or push notifications. To retain relevance, banks must offer data supported intelligence features, like buying patterns and finance tracking.

Retailers are no different. It is essential that data is collated from mobile, website and in-store interactions, such as customer feedback and returns data, to provide consumers with targeted and contextualised offers. This then enables further upselling and cross-selling opportunities down the line, directly driving revenue.

Data in motion and digital transformation

Businesses are now recognising the importance of developing a digital transformation strategy that fully utilises the data available to them. By successfully implementing such a strategy, a network effect can be capitalised upon; as various parts of the business consume more data, they also produce more, perpetually increasing the pool of available data.

Before, data only served a product or business solution. However, with access to data in motion these products or solutions are now creating their own data, which becomes a ‘product’ in its own right. Now, instead of data only serving the product, or application, the product is data which can drive further innovation in other products and applications.

In this way, data streaming, rather than traditional ETL, is revolutionising how we can work with data. It allows real-time accessibility to information, with contextual intelligence that developers of the past could only dream of. It is now possible for data streaming platforms to analyse events that are happening and execute tasks without the need for any human input, dramatically streamlining processes with pinpoint precision.

Now that data is central to everything they do, businesses are simplifying operations by augmenting legacy systems with real-time processes. To really capitalise on the wealth of data available to them, new data synergies must be formed. Migrating to data streaming will enable businesses to advance their data maturity, and will be the driving force for progress in the years to come.