Skip to main content

ETL Overview

ETL (extract, transform, load) integrations in Matia move data from a source (database, SaaS app, or file store) to a destination (typically a data warehouse). Each integration is a pipeline with its own source, destination, schema, schedule, and run history.

What an ETL Integration Is

An integration is one ETL pipeline: a configured connection from a source to a destination that runs on a schedule or on demand. The source and destination are assets — connections you create in Matia (e.g. a Postgres connection, a Snowflake warehouse). You choose which tables (streams) to sync and how (full refresh, incremental, or append-only). Matia extracts data, applies normalization and transformations as needed, and loads it into the destination.

How Data Flows

  1. Extract: Matia reads data from the source according to the schema (enabled tables and sync mode). For incremental syncs, only new or changed rows are read using a cursor.
  2. Transform: Data is normalized and transformed as required by the connector (e.g. flattening nested structures, type coercion). This can create a difference between emitted records (read from source) and committed records (written to destination).
  3. Load: Data is written to the destination in the configured schema/database. Sync history is recorded per run.

Where to Manage ETL Integrations

  • Integrations (sidebar): View all available integrations in a single list. Select an integration row to open its details, or use Add Integration to start the creation flow.
  • Integration details: A tabbed view for managing integrations.
    • Status: View sync history, emitted/committed data, and per-table insights
    • Schema: Enable or disable tables, set sync mode, and cursor settings
    • Settings: Manage triggers, schema changes, post-run actions, notifications, data resyncs, and integration deletion
    • Changelog: View audit log of changes
    • If supported by the connector, a Schema changes tab shows detected schema updates.

Key Concepts

  • Sync: A single run of the integration. Each sync has status (successful, failed, completed with errors), volume, emitted/committed counts, and duration.
  • Sync mode: Full refresh (all data each time), incremental (only new/changed rows), or append-only. Configured per table in the Schema tab where supported.
  • Schema changes: New schemas, tables, or columns in the source. You control whether the integration adopts them automatically; see How ingestion works and Sync modes.

For a first pipeline, see Create Your First Integration. For reference, see Sync modes, Sync logs and run history, and Assets and connections.