Gartner’s logical data warehouse: my interpretation of a concept
I’ve always regarded Gartner’s logical data warehouse as a conceptual framework: food for thought. It is both the accumulation of observations made on how organizations have shifted their data management practices as well as a discussion paper on what is required of modern data management for analytics.
The logical data warehouse is a thought process, not a recipe
There is no recipe attached to turn a logical data warehouse into a working system. There are many ways to interpret the concept. You should translate the concept into a fitting architecture first and then start thinking on the transition to that architecture. Your use cases, requirements and capabilities for governing your information determine what is a fitting architecture.
As a conceptual framework, it can help you to address all aspects involved and have focus in your conversations. The framework guides you to think about the interactions between the different logical components and what it means to you if you look at the complete picture.
What does the logical data warehouse concept consist of?
The image below is the visualization of the concept.
Without reading too much into the artwork, there are a few interesting choices made in the image:
- The grey box starts halfway the boxes the slanted boxes at the bottom, implying that the logical DWH is much more about the upper part of the image than the technical solutions to store and transport data.
- A big part of the logical data warehouse consists of components that emphasis the use of, access to and semantical consistency of the information. It is the mapping to the data models that is more important than the data storage and data logistical software and processes needed.
In my view, that’s something different than the emphasis on technology when you search for ‘logical data warehouse’. Since most of the white papers out there are written by technology vendors and consultancies that sell an implementation, that is to be expected. Concepts don’t earn cash most of the time. Disclaimer: neither for us.
A logical data warehouse is the same as a virtual data warehouse, right?
The funny thing is that ‘logical’ has been translated into ‘virtual’ most of the time. In my view, the image states clearly that the logical data warehouse uses a mix of technologies.
‘Repositories’ meaning data warehouses, data marts or other data stores, ‘virtualization’ meaning different technologies used to merge data across multiple repositories and transform data and ‘distributed processing’ meaning anything from middleware to ETL to do the same. One, two of them, or all three lay the infrastructure foundation on which the decomposition of the data warehouse functions is based. Those three slanted boxes are the abstraction of the whole ecosystem of data and data transport used.
Managing and regulating access to data when it is all over the place is a challenge. This challenge is addressed by the upper part of the image.
Different use cases require different solutions
Different use cases, from financial reporting to ad hoc exploration, translate into different requirements for the modelling of data spread out across different data stores and translate into different SLA’s within the same ecosystem. You can have a long debate amongst architects on what taxonomy/ontology resolution means, but my best guess is that they didn’t want to narrow down the process of giving meaning and context to data to (relational) data modelling alone and keep all options open.
The same information can have different access, quality and auditing requirements depending on the use case. When your data model is a logical composite, you need software that can manage the access to all those different physical data stores and data transport components that serve parts of the model. This can be in accordance with or fighting with the availability and maintenance requirements of the SLA, depending on the use case. That is what in my view the inequality, equality and approximation signs in the middle box mean.
To me, this is the core of the logical data warehouse as a concept. Questioning yourself on how to manage this distributed architecture, the requirements for access management, availability and maintenance and the data quality requirements which are different for each use case.
The logical data warehouse concept forces you to deal with complexity upfront
Complexity is the struggle we have faced for such a long time with the one big data warehouse that rules them all approach. And that’s where I think the added value of the logical data warehouse concept is: you need to read the image top-down, not bottom-up.
If anything, Gartner’s logical data warehouse concept tells us that there isn’t a single solution that can handle all requirements. Having a distributed architecture is an answer for that problem, but the consequences for data governance and access to data are not to be taken lightly.
Reactions through Twitter @MartijntenNapel or e-mail