- S3-based lakehouse (Apache Iceberg + Trino) with a dbt layer for analytics/regulatory reporting.
- Airflow DAGs for multi-source ingestion (retries/backoff, on-failure alerts) and coding/security standards.
- Data quality checks and documentation (EIOPA/ESMA/DORA).
Hi, I'm Vytautas.
Data engineer with hands-on experience building ELT pipelines and lakehouse-style analytics stacks in regulated environments. I like owning pipelines end-to-end, improving reliability, and delivering measurable outcomes.
Experience
Lakehouse architectures, Airflow orchestration and practical data quality.
- Automated incremental pipeline from MySQL to Apache Iceberg via Trino, orchestrated with Airflow.
- Improved query latency and analytics workflows.
- Led projects with strict compliance requirements; mentored peers and managed timelines and documentation.
Skills
Languages
Data Engineering
Infrastructure / Cloud
Data Quality / Governance
Nice to have
Projects
“Chat with Your Building” RAG Assistant
Focus: n8n orchestration, OpenAI embeddings/chat, ChromaDB retrieval
Proof of concept for a RAG assistant for property administration workflows. The core idea is simple: a resident asks a question, and the assistant answers based on the building profile (a structured record) covering administration, technical maintenance, key parameters, and tariff summaries.
It ingests structured “building passport” data (Vilnius City Municipality Open Data Portal), BIM data (normalized into tabular records), and technical building documentation (PDFs), then indexes the content for grounded Q&A. At runtime, the agent identifies the building (preferably by a unique building ID, but address-based lookup is also possible), retrieves the most relevant context from the appropriate source (passport / BIM / docs), and answers only from retrieved sources.
Example use cases
- “What are the specified wall materials?”
- “List all fire safety elements (fire doors, dampers, extinguishers) and their locations.”
- “Who maintains the engineering systems?”
The prototype also could connect an incident/fault register and scheduled maintenance data, so the assistant can answer:
- “What is planned for this month?”
- “When was the last heat substation inspection?”
RAG Runtime Architecture
ELT architecture
Focus: Airflow, dbt, S3 data lakeThis ELT project loads source files from object storage, stages them in an analytical database, and then transforms them into standardized, history-aware dimensional tables. Raw data is first copied into a dated backup area, then ingested into staging schemas with minimal structure changes. Next, transformation models build SCD2-style dimensions, applying data quality checks to ensure consistency and traceability. The entire pipeline (backup → load → transform → test → document) runs on Kubernetes cluster and orchestrated by Airflow.
Data product
Focus: maintenanceRefactored a data product that consolidates raw regulatory data into a single governed fact layer, enriched with metadata, entity attributes, and submissions status. Improved SQL readability and maintainability using well-structured CTEs.
Backup/load CLI
Focus: explain code logic
Python CLI that orchestrates three data maintenance modes over S3 and Trino/Iceberg:
backup (copy CSVs between S3 accounts into date-based folders),
load (normalize columns and write into Trino *_staging tables),
and cleanup (delete old backups by retention period). All behavior is
configured via environment variables and CLI arguments.
Contact form: email delivery
Focus: serverlessA fully serverless contact form for my portfolio website. The frontend sends a POST request with user input (email and message). API Gateway acts as a secure HTTP interface, routing requests to the backend. AWS Lambda processes incoming requests, validates the payload, and triggers email delivery. Messages are sent via AWS SES — no servers, no long-running services.
Static portfolio website: S3 hosting
Focus: HTTPS, cost-efficiencySimple website deployment automation with GitHub Actions and Terraform.
Weather Data System: Serverless ingestion
Focus: serverless simplicity and cost control (2024)Automated weather data ingestion to Postgres with subsequent analysis.
Machine Learning Models: ingestion + price prediction
Focus: experimentation pipeline (2024)Pipelines for ML experiments and price prediction.
Recommendation Engine: front-end & back-end
Focus: end-to-end demo (2024)Prototype with data preparation and a web UI.
Data Processing Pipeline: Docker + Airflow
Focus: reproducibility (2024)Containerised example with Airflow orchestration.
Certifications
Status: achieved
Status: in progress
Contact
Location: Vilnius, Lithuania
LinkedIn: linkedin.com/in/pliadis