In the latest Ometis blog, Chris Lofthouse offers his insight on Concatenate vs Link Tables in Qlik and when either option should be used.
Finding the answer to handling multiple fact tables in Qlik Sense can appear to be a puzzling dilemma when first presented with the issue. A Qlik Sense developer can either concatenate the multiple fact tables into a single master fact or use link tables instead.
When both options are possible, which should you use?
The generic, boring, answer is, ‘it depends’. However, my preference is to use link tables, when practicable, as it can provide a more intuitive user experience for end-users. Below, I have detailed my reasoning:
The link table approach is the process of building a bridge table that acts as the associative hub, which in turn combats a synthetic key in the data model – removing or aliasing the fields is not an option on its own. A link table resolves the many-to-many associative relationships by implementing a compound field (multiple field values concatenated into a single value) in the link and fact tables. Therefore, a link table typically contains compound fields and all the common fields between the multiple fact tables. The common fields can then be used for associative relationships between the dimensional tables.
As a result, the link table takes the position of the fact table, by becoming the centre piece of the data model and subsequently the original fact tables assume an outer position, alongside the dimensional tables.
Example:
In the example above, each of the fact tables originally contained two or more of the common fields; the common fields being Date ID, Salesperson ID, Customer ID and Product ID. Each of the common fields in this example are also used to associate to a respective dimensional table, but this is not always the case.
Each fact table in this example also has a newly created compound field, again not always necessary as in some circumstances the compound field may be the same for multiple facts. I’ve provided the detail of the compound fields below:
Compound field | Fields |
---|---|
%InventoryLinkID | Product ID, Date ID |
%SalesLinkID | Product ID, Date ID, Customer ID, Salesperson ID |
%SalesTargetLinkID | Date ID, Salesperson ID |
The concatenate approach is the process of combining multiple fact tables, into a single (master fact) table. The resulting table will contain a column for each unique field across both tables, and the row count totalling the sum of rows from the combining tables.
When a field exists in both tables (identically spelt field names), the values from both tables will combine into a single column in the concatenated table. When a field is unique – does not exist in both tables – it will be merged into the concatenated field and null values will be added to the records from the other table.
Example:
The date model example above contains the same data/source tables as the link table example. It shows a single concatenated fact table, the combination of three fact tables in total, associated to the dimensional tables.
Link table approach | Concatenate approach |
---|---|
[+] Reduces the need for users to write set expressions. | [-] Often requires Qlik users to write set expressions, limiting a measure to a specific source (fact table). Effectively, separating the fact data back out. |
[+] Keeping the fact tables separate makes the data model easier to understand, showing fields in their related tables. | [-] Can confuse users when some fields may only be related to records from certain fact tables. User may find the data is different to what they expect from the data model. |
[+] Reduced amount of superfluous null values. Remember, a null value still has a memory footprint. | [-] If a concatenated fact table contains a unique field (a field that doesn’t exist in one or more of the other fact tables), a null value will be used to populate the records of the other fact tables. |
[+] Prevents unnecessarily wide tables. | [-] Risks implementing very wide tables. This can be negative from a performance point of view, but also makes it harder to find fields in the front-end. |
[-] Typically increases the number of tables in the data model, with data duplicated in the link tables and containing additional compound fields. | [+] Reduces the number of tables in a data model, and potentially data. |
[-] Can potentially reduce performance due to the QIX engine query navigating through intermediary tables. | [+] Can have performance benefits as there are no additional tables to jump through. |
[-] Can be harder to design the data model and implement – requiring a Qlik expert. | [+] Easier to design and implement the data model. |
As I mentioned at the start of this post, my preference is link tables due to the better user experience they can provide end-users. Link tables keep the front-end simple, simplifying expressions and intuitive data model. So yes, link tables may result in a short-term headache for the developer but Qlik Sense is a self-service platform, designed for the end-user. Therefore, we must cater for the end-users in all aspects, especially the data model.
It is also common in more complex solutions, e.g. with the need to report on multiple dates using a canonical date field, to require a link table anyway. Therefore, you may be forced resort to a hybrid solution at the very least.
Ultimately, however, it may not be the choice of the developer. Depending on data volumes, infrastructure, etc. the decision may be revoked from the developer and which ever has the optimum performance will come out on top – a bit of trial and error never hurt anyone!
By Chris Lofthouse
Follow @clofthouse89
Don’t forget, you can stay up-to-date with all the latest Qlik news, tips and tricks by following us on LinkedIn or Twitter and subscribing to our YouTube channel.