In order to adapt to the new normal, organisations need to adopt a new data management architecture – allowing them to thrive in the digital sphere. Simon Spring, Account Director EMEA at WhereScape, discusses the key components of a Data Fabric and explains how it helps organisations to manage and maximise the value of their data.
If 2020 taught us anything, it was that change happens suddenly, unexpectedly and can have a significant impact on every aspect of our world. To counter such seismic changes, enterprises need to embrace agility at scale, adjusting objectives and goals – together with supporting operating processes and decision-making capabilities – almost overnight.
The pandemic forced organisations to manage change and make decisions faster than ever before. Indications are that this won’t be a temporary state of affairs, as digitalisation, automation and other forces of accelerated change continue to shape the new normal.
Thriving and surviving in today’s increasingly complex and volatile world calls for a new data management architecture – one that enables users to easily find and use analytical data and analytical assets to support the strategic and tactical decisions that must be made each day. Enabling this unfettered access – together with seamless accessibility and shareability by all those with a need for it – comes in the form of Data Fabric.
Data Fabric: What is it and what can it deliver?
Data Fabric is an all-encompassing analytics architecture that ensures that all forms of data are captured and integrated for any type of analysis data and is easily accessible and searchable for users across the entire enterprise.
Born out of the pressing need to find a better way to handle enterprise data, a Data Fabric utilises analytics over existing and discoverable metadata assets to support the design, deployment and utilisation of integrated and reusable data across all environments.
According to Gartner, there are four key attributes of a Data Fabric architecture: it must collect and analyse all forms of metadata including technical, business, operational and social; it must convert passive metadata to active metadata for frictionless sharing of data; it must create and curate knowledge graphs that enable users to derive business value; and it must have a robust data integration backbone that supports all types of data users.
Put simply, a Data Fabric is a single environment consisting of a unified architecture and services that helps organisations manage and maximise the value of their data, by eliminating data silos and simplifying access to multiple data assets across the entire business on demand, making it faster and easier to gain new insights and undertake real-time analytics.
The good news is that as an overlay to existing legacy applications and data, the Data Fabric enables organisations to maximise value from their existing data lakes and data warehouses. There is no need to rip-and-replace any existing technology investment.
So, what are the key components of a Data Fabric?
Key architectural components: An overview
Consisting of multiple components, data flows and processes that must all be coordinated and integrated, the Data Fabric analytics architecture features a complex array of technologies and functions. These include:
- A real-time analysis (RT) platform – The first analytical component that analyses streams of data (transactions, IoT streams, etc.) coming into the enterprise in real-time.
- The enterprise data warehouse (EDW) – The production analytics environment where routine analyses, reports and KPIs are produced on a regular basis using trusted reliable data.
- The investigative computing platform (ICP) – Used for data exploration, data mining, modelling and cause and effect analyses. Also known as the data lake, this is the playground for data scientists and others who have unknown or unexpected queries.
- A data integration platform – That extracts, formats and loads structured data into the EDW and invokes data quality processing where needed.
- A data refinery – For ingesting raw structured and multi-structured data, distilling it into useful formats in the ICP for advanced analyses.
- Analytics tools and applications – To create reports, perform analyses and display results.
- A data catalogue – Which acts as an entry point for users where they can view what data is available and discover what analytical assets already exist. This needs to be meticulously maintained and updated.
Implementing a Data Fabric: The technical processes involved
Those responsible for building and maintaining a Data Fabric face a big task. The simpler they make the business community’s access and utilisation of the analytics environment, the more complex the infrastructure becomes.
Technologies that work seamlessly to support a variety of processes will be key. These include:
- Discovery – Alongside detecting what data and assets already exist in the environment and getting the full metadata on data lineage (sources, integration techniques and quality metrics), technical people can utilise usage statistics (who is using what and how often) and impact analysis to understand what data and analytical assets are impacted if an integration programme changes.
- Data availability – If a user requests data that is not available, potential sources will need to be researched and assessed in terms of quality, accessibility and suitability for the requested purpose. All this information needs to be documented into the data catalogue for future usage.
- Design and deploy – Populating the right analysis component (EDW, ICP and RT) with the right data and technologies from the appropriate source of data, utilising data integration and quality processes to ensure the data can be trusted. Sensitive data must be identified and protected by encryption or other masking mechanisms.
- Monitoring – The data catalogue must be updated with the latest additions, edits and changes made to the Data Fabric, its data, or its analytical assets. Similarly, any changes in data lineage or usage should be monitored.
Top tips for success
For the Data Fabric to succeed, organisations must commit to maintain the integrity of the architectural standards and components it is built on. So, if silos are created as temporary workarounds, these will need to be decommissioned when no longer needed. Since the value of the Data Fabric depends on the strength of information gathered in the data catalogue, out of date, stale or inaccurate metadata cannot leak into the catalogue.
Finally, simply fork-lifting legacy analytic components, like an ageing data warehouse, into the fabric could result in integration problems. Ideally, these legacy components should be reviewed and redesigned.
While it is a big undertaking, successful Data Fabric environments are already proving their worth when it comes to enabling companies to leverage data more effectively and unlocking the full potential of their data assets to gain competitive advantage.Click below to share this article