- S3‑based lakehouse (Apache Iceberg + Trino) with a dbt layer for analytics/regulatory reporting.
- Airflow DAGs for multi‑source ingestion (retries/backoff, on‑failure alerts) and coding/security standards.
- Data quality checks and documentation (EIOPA/ESMA/DORA).
Hi, I'm Vytautas.
Data engineer with hands-on experience building ELT pipelines and lakehouse-style analytics stacks in regulated environments. I like owning pipelines end-to-end, improving reliability, and delivering measurable outcomes.
Experience
Lakehouse architectures, Airflow orchestration and practical data quality.
- Automated incremental pipeline from MySQL to Apache Iceberg via Trino, orchestrated with Airflow.
- Improved query latency and analytics workflows.
- Led projects with strict compliance requirements; mentored peers and managed timelines and documentation.
Skills
Languages
Data Engineering
Infrastructure / Cloud
Data Quality / Governance
Nice to have
Projects
ELT architecture
Focus: Airflow, dbt, S3 data lakeThis ELT project loads source files from object storage, stages them in an analytical database, and then transforms them into standardized, history-aware dimensional tables. Raw data is first copied into a dated backup area, then ingested into staging schemas with minimal structure changes. Next, transformation models build SCD2-style dimensions, applying data quality checks to ensure consistency and traceability. The entire pipeline (backup → load → transform → test → document) runs on Kubernetes cluster and orchestrated by Airflow.
Backup/load CLI
Focus: explain code logic
Python CLI that orchestrates three data maintenance modes over S3 and Trino/Iceberg:
backup (copy CSVs between S3 accounts into date-based folders),
load (normalize columns and write into Trino *_staging tables),
and cleanup (delete old backups by retention period). All behavior is
configured via environment variables and CLI arguments.
Static portfolio website: S3 hosting
Focus: HTTPS, cost-efficiencySimple website deployment automation with GitHub Actions and Terraform.
Weather Data System: Serverless ingestion
Focus: serverless simplicity and cost control (2024)Automated weather data ingestion to Postgres with subsequent analysis.
Machine Learning Models: ingestion + price prediction
Focus: experimentation pipeline (2024)Pipelines for ML experiments and price prediction.
Recommendation Engine: front‑end & back‑end
Focus: end‑to‑end demo (2024)Prototype with data preparation and a web UI.
Data Processing Pipeline: Docker + Airflow
Focus: reproducibility (2024)Containerised example with Airflow orchestration.
Certifications
Status: achieved
Status: in progress