Basic Concepts
The Airbyte Specification
As a quick recap, the Airbyte Specification requires an Airbyte Source to support 4 distinct operations:
-
Spec
- The required configuration in order to interact with the underlying technical system e.g. databaseinformation, authentication information etc.
-
Check
- Validate that the provided configuration is valid with sufficient permissions for one to perform allrequired operations on the Source.
-
Discover
- Discover the Source's schema. This let users select what a subset of the data to sync. Usefulif users require only a subset of the data.
-
Read
- Perform the actual syncing process. Data is read from the Source, parsed intoAirbyteRecordMessage
sand sent to the Airbyte Destination. Depending on how the Source is implemented, this sync can be incremental
or a full-refresh.
A core concept discussed here is the Source.
The Source contains one or more Streams (or Airbyte Streams). A Stream is the other concept key to understanding how Airbyte models the data syncing process. A Stream models the logical data groups that make up the larger Source. If the Source is a RDMS, each Stream is a table. In a REST API setting, each Stream corresponds to one resource within the API. e.g. a Stripe Source would have have one Stream for Transactions
, one for Charges
and so on.
The Source
class
Airbyte provides abstract base classes which make it much easier to perform certain categories of tasks e.g: HttpStream
makes it easy to create HTTP API-based streams. However, if those do not satisfy your use case (for example, if you're pulling data from a relational database), you can always directly implement the Airbyte Protocol by subclassing the CDK's Source
class.
The Source
class implements the Spec
operation by looking for a file named spec.yaml
(or spec.json
) in the module's root by default. This is expected to be a json schema file that specifies the required configuration. Here is an example from the Exchange Rates source.
Note that while this is the most flexible way to implement a source connector, it is also the most toilsome as you will be required to manually manage state, input validation, correctly conforming to the Airbyte Protocol message formats, and more. We recommend using a subclass of Source
unless you cannot fulfill your use case otherwise.
The AbstractSource
Object
AbstractSource
is a more opinionated implementation of Source
. It implements Source
's 4 methods as follows:
Check
delegates to the AbstractSource
's check_connection
function. The function's config
parameter contains the user-provided configuration, specified in the spec.yaml
returned by Spec
. check_connection
uses this configuration to validate access and permissioning. Here is an example from the same Exchange Rates API.