Workspace overview
A Flow Workspace is a collection of schemas, API specs and Taxi projects that describe data sources (APIs, Databases, Message queues and Serverless functions) and provide a description of the data and capabilities they provide.
These descriptions are used to generate integration on the fly, and power your data catalog.
Data sources are described using some form of API spec language (Protobuf, OpenAPI, Avro, Taxi), along with additional metadata that defines how concepts are related semantically.
There are lots of different ways to tell Flow about your data sources. We believe that Flow should fit with the way you work today.
Components
Generally, there are at least the following components:
-
A taxonomy project (built using Taxi) that defines the terms embedded in your API contracts. This is most commonly stored in Git, or locally on your machine (when you’re first getting started).
-
A series of API specs, enriched with metadata using terms from your taxonomy project.
Your core taxonomy
The core taxonomy is a collection of terms that describe your data attributes. Think of them
as simple tags that describe your data - such as FirstName
, LastName
, etc.
In practice, these tags are actually semantic types, defined using Taxi.
type FirstName inherits String
type LastName inherits String
type CustomerId inherits Int
By using Taxi, you get the benefit of a full type system, rich build tooling to verify correctness, and an open source ecosystem to build your own plugins.
A common pattern for building a core taxonomy is:
-
It is created as a standalone project, independent of any specific API / Service
-
It lives in a Git repository
-
Flow monitors the Git repository and automatically deploys changes
Services and data sources
Services and data sources are described using existing API specs (OpenAPI / Protobuf / JSON Schema, etc), enriched with semantic tags from your core taxonomy.
# An extract of the ShoppingCartApi OpenAPI spec:
components:
schemas:
ShoppingCart:
properties:
custId: # Field name
x-taxi-type: # <-- Taxi extension
name: CustomerId # <-- Semantic type
type: string
import "org/taxilang/dataType.proto";
message Customer {
optional string customer_name = 1 [(taxi.dataType)="CustomerName"];
> optional int32 customer_id = 2 [(taxi.dataType)="CustomerId"];
}
Use Taxi
Another option is authoring your API specs directly in Taxi:
model Customer {
id : CustomerId inherits Int
firstName : CustomerFirstName inherits String
}
service CustomerService {
@HttpOperation(method = "GET", url = "/shoppingCart/{CustomerId}")
operation getShoppingCartByCustomerId(CustomerId):ShoppingCart
}
Alternatively, you can just generate schema definitions directly from code, or author your definitions in Taxi.
Once you’ve chosen how to describe your services, next publish your specs to Flow.
Publish and update your specs
How and when you choose to publish / update these sources is up to you - teams tend to have different preferences around this.
You’re also free to mix and match. Just pick the publication method that best works for your tech and your team’s working style.
The following publication methods are available:
-
Pushing API specs to Flow (e.g., when your microservices start)
-
Pushing updates to Flow manually (e.g., within a CI/CD pipeline)
-
Having Flow poll sources for changes
Push API specs directly from your apps
Applications can generate Taxi, direct from code, and publish to Flow.
This suits microservices and teams who prefer generating their API specs from code.
Pull API specs from Flow
Flow can be configured to fetch API specs (projects) from either Git repositories or the file system where Flow is running (intended for local development).
In addition to storing your core taxonomy in Git, you can fetch other API specs (such as Protobuf or OpenApi) from a Git repository.
Flow will poll your Git repo, and automatically update as changes are merged into your target branch.
For more information, read about pulling API specs from Git or reading API specs from local disk.
Workspace.conf file
The workspace.conf
file is a HOCON file that describes all the locations to pull code from. (Data sources that
are pushing their API specs are not included.)
Schemas can be pulled from multiple different formats and approaches. The configuration for these repositories is defined in a HOCON format file.
Pass a workspace.conf file
By default, the configuration file is called workspace.conf
. However, the location of the file can be changed by setting --flow.workspace.config-file=/path/to/workspace.conf
on the command line, or through any of the supported configuration overriding mechanisms.
Read workspace.conf from Git
For production deployments, it’s often preferable to read config directly from Git. This is useful both for Infrastructure-as-code, as well as for deploying to services where there’s ephemeral storage (like AWS ECS).
You can configure Flow to read a workspace.conf
file from a git repository, by passing the following command line settings:
Setting | Description |
---|---|
|
The url of the Git repo to clone. If pulling from Github, Gitlab or Azure DevOps, use a personal access token in the url (eg: |
|
The name of the branch to check out |
|
Optional The path within the repo to read the config file. Defaults to |
Use a single-project workspace
For demos / test config, it’s sometimes useful to start Flow with a single project configured.
You can bypass the workspace config, and point Flow directly to a single local file-system project.
To do this, start Flow with --flow.workspace.project-file=/path/to/taxi.conf
Configure conventions
Durations are defined using ISO 8601 formats. For example:
-
1 Day =
P1D
-
3 Seconds =
PT3S
Kitchen sink configuration example
file {
changeDetectionMethod=WATCH
incrementVersionOnChange=false
projects=[
{
isEditable=true
path="/opt/var/flow/schemas/taxi"
}
]
pollFrequency=PT5S
recompilationFrequencyMillis=PT3S
}
git {
checkoutRoot="/my/git/root"
pollFrequency=PT30S
repositories=[
{
branch=master
name=my-git-project
uri="https://github.com/something.git"
}
]
}
Continue reading
Continue learning about Flow by connect your data sources.