ETL Overview
ETL (extract, transform, load) integrations in Matia move data from a source (database, SaaS app, or file store) to a destination (typically a data warehouse). Each integration is a pipeline with its own source, destination, schema, schedule, and run history.
What an ETL Integration Is
An integration is one ETL pipeline: a configured connection from a source to a destination that runs on a schedule or on demand. The source and destination are assets — connections you create in Matia (e.g. a Postgres connection, a Snowflake warehouse). You choose which tables (streams) to sync and how (full refresh, incremental, or append-only). Matia extracts data, applies normalization and transformations as needed, and loads it into the destination.
How Data Flows
- Extract: Matia reads data from the source according to the schema (enabled tables and sync mode). For incremental syncs, only new or changed rows are read using a cursor.
- Transform: Data is normalized and transformed as required by the connector (e.g. flattening nested structures, type coercion). This can create a difference between emitted records (read from source) and committed records (written to destination).
- Load: Data is written to the destination in the configured schema/database. Sync history is recorded per run.
Where to Manage ETL Integrations
- Integrations (sidebar): View all available integrations in a single list. Select an integration row to open its details, or use Add Integration to start the creation flow.
- Integration details: A tabbed view for managing integrations.
- Status: View sync history, emitted/committed data, and per-table insights
- Schema: Enable or disable tables, set sync mode, and cursor settings
- Settings: Manage triggers, schema changes, post-run actions, notifications, data resyncs, and integration deletion
- Changelog: View audit log of changes
- If supported by the connector, a Schema changes tab shows detected schema updates.
Key Concepts
- Sync: A single run of the integration. Each sync has status (successful, failed, completed with errors), volume, emitted/committed counts, and duration.
- Sync mode: Full refresh (all data each time), incremental (only new/changed rows), or append-only. Configured per table in the Schema tab where supported.
- Schema changes: New schemas, tables, or columns in the source. You control whether the integration adopts them automatically; see How ingestion works and Sync modes.
For a first pipeline, see Create Your First Integration. For reference, see Sync modes, Sync logs and run history, and Assets and connections.