Tips on taxonomies
Overview
Flow is a data integration and querying platform, built on top of the Taxi schema language.
The idea behind Taxi (and semantic types in general) is to allow producers of data to define both the structural contract (the labels assigned to fields and shapes of objects), and the semantic contract (what each field actually means).
The key to making this work well is in defining a shared taxonomy - a set of common terms with small scope and well-defined meaning. In this guide, we’ll take a look at tips and pitfalls when crafting a taxonomy.
Structure your taxonomy
Avoid common domain models
Trying to enforce standardization of contracts between multiple systems is really hard, and lots of time is spent forming a consensus of design between teams that are working on different goals. This creates complex processes for agreeing to making changes, which makes innovation hard.
Many of the reasons for choosing a shared model across teams (such as lower cost of integration) are addressed out-of-the box with Flow, which gives you all the benefits of shared models, without the overhead associated with enterprise domain models.
Types are intended for sharing
Types don’t have structure. In software terms, types are referred to as scalar, i.e., they don’t contain any
attributes or fields. Things like String
or Number
are typical examples of scalars.
In Flow and Taxi, types take on a semantic meaning, i.e., CustomerFirstName
instead of String
.
Types shouldn’t have structure or fields, as this makes it harder to share them between systems and teams. Instead, favor lots of small types with clearly defined meanings. Getting teams to agree on the meaning of a field is easier than getting teams to agree on how to structure or name it. It is best practice to build a set of well defined types that form the basis of your glossary. This set of types is commonly called a Taxonomy. These types should change infrequently, so spend the time to ensure they’re well documented, with clear definitions of meaning. |
Models are intended for systems
Models represent a strict contract that a system exposes. As such, the team that designs the system is best placed to design the contract as it makes sense to them.
Avoid trying to design models via consensus. As discussed above, getting consensus on shared models is tough, and most of the reasons for adopting shared models are solved using semantic types.