Joy Pedze

Data Engineer & Platform Architect — I design reliable, scalable data systems (Lakehouse, Streaming, CI/CD) so teams can trust their data.

Lakehouse (Delta)Kafka StreamingAirflow & dbtAWS (S3/Glue/EMR/Redshift)
GitHubLinkedIn

About Me

Software Engineering Background

Started my journey in software engineering, developing robust systems using Java, Spring Boot, and Angular. This foundation gave me a strong understanding of system architecture and software development best practices.

Data Engineering Focus

Transitioned to data engineering with 3 years of hands-on experience building data pipelines, working with data warehouses, data lakes, and data lakehouses. Passionate about creating scalable and efficient data solutions.

Skills & Technologies

Comprehensive expertise across modern data engineering, cloud platforms, and development practices

Languages & Core

Core programming languages and development tools

Python (PySpark)
SQL
Bash
Git
Processing & Orchestration

Data processing engines and workflow orchestration

Apache Spark
PySpark
dbt
Airflow (MWAA)
Kafka (Streams/Connect)
Spark Structured Streaming
Cloud & Storage

Cloud infrastructure and data storage solutions

AWS: S3, Glue, EMR, Lambda, Kinesis, Redshift, Lake Formation, IAM, CloudWatch
Delta Lake / Apache Iceberg / Apache Hudi
Parquet
Hive/Glue Catalog
Modeling & Architectures

Data modeling patterns and architectural approaches

Lakehouse & Medallion (Bronze/Silver/Gold)
Kimball dimensional modeling (star/snowflake)
SCDs
Lambda / Kappa streaming patterns
Data Vault (awareness)
CDC
DevOps for Data

Infrastructure as code and deployment automation

Terraform
Docker
GitHub Actions/Jenkins
CI/CD for pipelines
IaC modules
Quality, Governance, Security

Data quality, governance, and security practices

Great Expectations / Soda
Data Contracts
PII handling
encryption at rest/in-transit
RBAC
Monitoring & Cost

Performance monitoring and cost optimization

CloudWatch
Prometheus/Grafana basics
partitioning & file layout for performance/cost
Platforms

Modern data platforms and services

Databricks (Jobs/Repos/Delta Live Tables)
Snowflake/BigQuery/Redshift

Experience Highlights

Data Architecture & Engineering
3 years of experience
  • • Designed and implemented data warehouses, data lakes, and data lakehouses
  • • Built scalable data pipelines using Apache Spark and Apache Kafka
  • • Orchestrated complex workflows with Apache Airflow
  • • Worked extensively with AWS cloud services
Database Management
Multi-database expertise
  • • MS SQL Server administration and optimization
  • • PostgreSQL database design and management
  • • MongoDB for NoSQL data solutions
  • • Database performance tuning and optimization

Let's Connect

I'm always interested in discussing data engineering opportunities, sharing knowledge, or collaborating on exciting projects.

Get In Touch

Connect with me on LinkedIn or use the contact form to get in touch about opportunities, collaborations, or just to say hello!

Send me a message
I'll get back to you within 24-48 hours.