The Data Stack

At the heart of any aspiring data centric organisation is the Data Stack.

There are 6 components of a data stack. At the centre of the stack is the Data Warehouse. This is where data is brought together from across the business to be stored, manipulated and used. This will include data from different applications and sources across the business as well as data from external sources. Master Data and Reference data.

Data is captured using business systems. Traditionally these can include a variety of technologies and databases to capture, store and use data. They can vary from web-based tools, online SaaS, on-prem legacy tools, and, far too often, a plethora of spreadsheet based “systems”. These tools may be integrated or running independently. A modern business can have, on average, 80 different applications or systems that capture and manipulate data.

Sales automation – how to engage, retain and convert your visitors into advocates

These range from web-based presales and websites tools, through a sales or CRM tool, fulfilment, operational tools and supply management software.  There may also be financial tools managing the key financial operational stages of Contracts and Sales Invoicing, Purchase Order and Purchase Invoice Management, Financial reconciliation systems and finally the Accounts software itself.  They may write to individual storage devices or databases.  If the business systems are not integrated across business departments, each of these systems have probably evolved from singular departmental need with little reference to the rest of the business – the business silo effect

Data quality, in particular poor data quality, is often described as a cultural issue.  It is most often manifested through inappropriate technology and business silos.

Data Architecture

The keys to data quality lie in basic Data Architecture which will exhibit the following features:

  • Consistent, stable data capture
  • Taxonomy; a business wide standard naming convention
  • Core Data Quality – Dates, numbers and text are defined and stored correctly
  • Primary Keys and referential integrity
  • Stable, accessible and secure data storage
  • A Data warehouse
  • Node Extraction
  • Data Analysis

The Data Stack

The six components of the traditional data stack include the following elements

  • Source
  • Ingestion
  • Storage
  • Modelling
  • Visualisation
  • Activation

ETL – the ingestion stages – Extract, Transform, Load

Data Stack Components

I. Data Source

The Data Stack components start with the data captured in the front-end applications. This is known as the Data Source and is made up of individual application databases and, in the case of spreadsheet based systems, each individual spreadsheet which creates a whole new set of issues to consider.

II. Data Ingestion

These are the tools that move data from individual application databases to the data storage facility 9most often referred to as the Data Warehouse. There are three stages to a data ingestion process: Extraction, Transformation, and Load (ETL) though the last two stages can be interchangeable as ETL or ELT (extract, Load then transform). Data Ingestion most often refers to the movement of data from application to data warehouse. In an Integrated business environment, where data is captured once and used many times, ETL processes will feed data into other applications

III. Data Storage

Here you may find the terms Data Lake and Data Warehouse being used interchangeably. Very often, data is extracted in raw form from Applications and stored in a Data Lake. This data is then used as the repository for the Transformation and Load phases of data prepared for the Data Warehouse. A data Warehouse is an enterprise-wide consolidated data storage system. All data captured in all systems which is required for reporting or modelling should be ingested into the Data Warehouse. This raises understandable issues around security and confidentiality and very often, the data warehouse is not exposed to the business. A final layer of Data Marts, or Strategic Marts are prepared. These present a specific subset of the data in the data warehouse relating to a specific department or project. The data can be drawn from across the business, but access is limited to the applications or people who should have access to that data.

iv. Modelling

Data in the central data storage system is then transformed into specific data structures that will meet analytics, forecasting or Machine learning requirtements. The effort involved in building data models is handsomely repid in the quality of the information and knowledge that is derived. Data Models can extract master data from across the business, combine that will reference data to build a subset of data that can be used as a data mart and exposed as required. This may be used by specific people or departments, used to ingest data back into applications or even shared with third parties.

v. Visualisation

This is often referred to as the Business Intelligence (BI) Function. It refers to the ability to make data readable or useable. The interpretation of this term determines what is readable and what is useable.

You may be talking of Dashboards, reports or even basic csv outputs to be used elsewhere. The likelihood is that the process to date will have generated too much data for humans to ingest so filters, charts, graphs and similar tools will be required to make the data useful.

vi. Activation.

This is the final piece in the data stack and is applied to integrated business systems. Data has been extracted from application data sources into a centralised data storage platform. The activation phase is where data is fed back into the application databases in a reverse ingestion process. This will involve, most usually an ETL process. Data is extracted from the Data Warehouse via a Data Model, Transformed as required, and then loaded into the application database to be reused (could still be an ELT process depending on the application in question)

This process will enhance the data that you are using in your applications by adding layers extracted from additional sources. This will help you turn Insights into Action.

The Modern Data Stack

There are several definitions of Modern v Legacy Data Stacks.

One of the most common in use, is to reinterpret the basic model above and describe a Modern Data Stack as one that uses cloud technology and modern tools. Conversely, a Legacy Data Stack uses a mix of cloud and on-prem technology and hardware.

The weaknesses in this process flow model include the following:

  • Inherent Latency between application data and Data Warehouse and the weakness of the Data pipelines. Real time analytics can become a problem and issues with Ingestion can cause failures in the model leading to data quality issues.
  • Many organisations struggle with the concept of co-locating all data. Anyone who has lead a Master Data Management Strategy review will be aware of the issue that departmental heads have with appearing to lose control of their data.

Key elements of the Modern Data Stack

  • Speed
    • Modern business needs answers quickly so data models need to be working in real time.
  • Scalability
    • Data volumes have been growing exponentially
  • Simplicity
    • Complexity creates latency and makes a system more fragile

It is very easy in any data modelling exercise for the process to get too complex. It is very important to keep the whole process, and the various building blocks, as simple as possible. Simplicity is often the basis of speed. Modern business needs answers quickly so data models need to be working in real time. And finally, scalability; as data volumes grow, then the Modern Data Stack needs to be able to grow. Data volumes are currently growing exponentially and there is nothing to suggest that this will change.

Keeping things simple, reducing points of weakness leads one to the final point to bear in mind. Choosing a base data storage system is probably more important than the technology that is used for data capture. The technology used to drive your data storage; the Data Lake and Data Warehouse should become your Data Storage of choice. Choosing application technologies that can store data as native in that environment will be the best solution.

In conclusion

The Data Stack is the set of components that can add value to Data. The value in data lies on what you can learn from it. It is the facts that lead to information and Knowledge.

Data centric organisations add value through data. They use data to inform, model and forecast. The basis of data value lies in quality data. Data that you can trust. Data that gives you information in which you have confidence, that leads to knowledge that makes a difference.

Quality data is easy to find, accessible when you need it, and includes all the granular data you will require. Quality data is secure, reliable and consistent across the organisation. Quality data is correct, accurate and up-to-date.

A modern data stack is simple, scalable, fast and reliable.

Ask yourself…

  • What does your website say about you and your brand?

  • Do you understand what your site looks like and how it should develop?

  • Do you monitor your site on an ongoing basis?

  • What happens when your site is not available?

If you would like to arrange a Demo of our Risk & Compliance Tool, please fill in the enquiry form using the button below.