This article is the second of a serie which aims to demonstrate that with the right approach and the right modelisation method, your Data Warehouse initiative, can be successful at a very reasonable price. In particular we will make a deep dive into the main component of the Data Vault method: The HUB
A hub contains a natural key to identify uniquely (at least, it is hoped!) an instance of a business concept. A natural key is a key which is visible and used by the organization to identify an instance of a concept. As an example, the employee number is used to uniquely identify an employee in different contexts/processes : human resources, management of parking lots, incidents, payroll, financial system… The employee number is therefore the point of connection (communication) of the ’employee’ concept between different business units, Hence the name Hub.
In relational model database design, a natural key is a key that is formed of attributes that already exist in the real world. For example, a USA citizen’s social security number could be used as a natural key. In other words, a natural key is a candidate key that has a logical relationship to the attributes within that row. A natural key is sometimes called a domain key. (From Wikipedia, the free encyclopedia)
A natural key is different from internal identifiers of the source systems that are, in principle, non-visible and specific to the system that generates the values.
In an ideal world, every concept would be associated to the same key, regardless of the business unit, and the key would be unique. In practice, it is far from being always the case. For example, the management of incidents could use its own natural key for various resources involved.
The job of a Hub is to track the first time the Data Vault sees a Business Key arrive in the Warehousing load, and where it came from.
The Purpose of a Hub is to provide a soft-integration point of raw data that is not altered from the source system, but is supposed to have the same semantic meaning. (Linstedt, Dan, 2010)
Here are criteria to meet in order to be a good Hub.
- Represents one and only one single concept;
- Contains no descriptive data elements (example: the name of an employee). A Hub’ satellites contain the descriptive part;
- Contains no relationship. The Links contain relations between the hubs and satellites contain relations between the description of a concept and the concept (Hub) described;
- Ideally contains a unique natural key consisting of at least a data element that identifies the concept (example: the number of the parking lot). For example, if in two sources, a same natural key value does not match the same instance of a concept, you must uniquely identify the different instances using the natural key combined with the name of the source (or division, department, country,etc.) where the data comes from.
- Always contains at least two attributes of information allowing for the traceability: the source where the data came from and when the data was brought in the hub (Time Stamp).
- Is associated with at least one satellite to describe.
- Contains a Hub Sequence Id
The Hub Sequence Id is a Data Warehouse created and managed primary key sequence ID. This is the key that will be used to form key constraints. The Hub Sequence ID has a one-to-one (1:1) relationship with the Natural Business Key of the Hub. (Hultgren, Hans, 2012)
Dario Mangano, extrait du chapitre »The Core DataWarehouse Layer of the Integrated Data Hub », du livre « The Integrated Data Hub » publié 22 octobre 2013.ACCÉDER AU DOSSIER