Über Muhammad Moiz
- AWS Services: Lambda, Glue Streaming (Kafka Konsumenten), Glue ETL, API Gateway, Aurora RDS, S3, DMS, Athena, Event Bridge
- AWS Databricks, Azure Databricks, PySpark
- Power BI Tabellen Performance Optimierung & MicroStrategy 10/11
- Redshift RA3 (Distribution Keys, Sort Strategies, WLM Queues)
- Unity Catalog, Delta Lake, Workflow Orchestration
- Infrastructure as Code (Terraform/Terraspace, Liquibase)
- Appian: Enterprise Data Platform mit AWS Glue & Lambda, Apache TIKA Integration
- Circle K: 6M+ tägliche Transaktionen, AWS-Kostensenkung, 600+ Locations
- Couche-Tard: Performance-Steigerung Power BI + Redshift, 3-5x Photon Acceleration
- TotalEnergies: AWS Transfer Family, SAP BOXI SFTP (Fingerprint) Integration, Prozessautomatisierung
- Compute: Lambda, ECS (Docker Container), EC2
- Data: Glue, Redshift, Aurora RDS, DMS, PostgreSQL, Athena
- Storage: S3 (Lifecycle Policies, Cross-Account Access)
- Integration: API Gateway, Transfer Family, EventBridge, GitHub OIDC
- Security: IAM, KMS, OIDC, SCIM Passthrough
- Python Development: Lambda, PySpark (Databricks), Glue Jobs, Shell/SFTP Scripting (SQL, Apache TIKA)
- Data Processing: ELWIS, PEGELONLINE WSV, DWD Datasets
- Compliance: GDPR Implementation, Data Classification Strategies
- Documentation: Comprehensive Standards, API Design Patterns
- Performance: Distribution Keys, Sort Strategies, WLM Queue Optimization
- Database Expertise: Aurora RDS Postgre, Oracle (OCP 11g DBA certified), Redshift, Exasol
- Data Modelierung, DV2.0, 3NF, Canonical
Deutsch
Konversationssicher
Englisch
Muttersprachlich oder zweisprachig
Deutsch
Verhandlungssicher
Englisch
Verhandlungssicher
Projekt- und Berufserfahrung
- Transport - Public SektorLead Data Engineer - Data Platform für AppianTRANSPORTWESENNovember 2024 - Heute (1 Jahr und 7 Monate)Frankfurt am Main, DeutschlandLeading enterprise data platform design and implementation with scalable architecture patterns and compliance-driven development.Enterprise Architecture & Solution Design
- Designed Medallion Architecture (Bronze/Silver/Gold) supporting multiple data domains
- Implemented serverless ETL pipelines using AWS Glue and Lambda with PySpark DataFrames
- Created architectural blueprints for event-driven data ingestion using API Gateway and Lambda
- Built Terraform modules following modular design patterns for reusable infrastructure
Real-Time Kafka Streaming Pipeline- Architected enterprise Kafka-to-PostgreSQL streaming ETL processing 10,000+ messages/hour
- Engineered AWS Glue Streaming job with advanced Kafka consumer group management
- Built production monitoring: CloudWatch metrics, S3-based DLQ, correlation IDs
- Implemented dual-layer database resilience: psycopg2 + RDS Proxy achieving 95%+ success rates
- Designed micro-batch architecture with configurable windows
Public Transport Infrastructure - Real-Time Search- Architected OpenSearch CDC ingestion processing 10,000+ records/minute
- Implemented production Lambda function (Python 3.13, 3GB) with 1,000 events/batch
- Resolved security compliance issues (tfsec, tflint, Bandit, Semgrep)
- Achieved zero-downtime deployment using versioned OpenSearch Pipeline v2
Infrastructure Patterns & Best Practices- Standardized Terraform module structure across 15+ modules
- Implemented GitOps workflow with automated security scanning
- Applied patterns: conditional resources, lifecycle controls, security group reuse
- Established CloudWatch monitoring for pipeline health metrics
Outcome: Enabled scalable platform supporting multiple domains with real-time streaming, search functionality, and standardized patterns ensuring compliance and operational excellenceTechnologies: AWS (Lambda, Glue, S3, API Gateway, Aurora Postgre, OpenSearch, RDS Proxy), Kafka, PySpark, Terraform, psycopg2 - Circle KProject Data Manager / Data EngineerENERGIEApril 2025 - Juli 2025 (4 Monate)Berlin, DeutschlandArchitected and executed AWS-to-multicloud migration for Circle K's data platform (AWS/Azure Databricks), designing cross-cloud data transfer solutions that modernized infrastructure while maintaining zero-downtime operations for 600+ retail locations.Cross-Cloud Migration Architecture:
- S3-to-Azure Blob pipeline via AWS Databricks for cross-cloud data transfer from legacy AWS to Circle K Azure
- Phased migration strategy maintaining single legacy AWS account for synchronization while transitioning workloads to Circle K environments
- Unity Catalog volumes collaboration with Circle K metastore admins, defining schema structures and access patterns
- Migrated from low-level Spark RDDs to high-level DataFrame APIs, enabling Photon acceleration for performance gains
Infrastructure as Code & Automation:- Implemented Terraspace/Terraform Redshift stack and modules, with RA3 node configuration
- Implemented Redshift native schedules with associated IAM roles for automated ETL workflows
- Managed database scripts for database and schema definitions / user access management
Data Pipeline Modernization:- Refactored multiple Python notebooks from AWS Databricks to Azure Databricks, optimizing for Photon accelerator
- Implemented Unity Catalog Delta tables for daily POS data processing with automatic schema evolution
- Created weekly KPI / reports using Databricks views analyzing product volumes and sales metrics
- Migrated data pipelines from sequential processing to parallel Spark operations
Outcome: orchestrated zero-downtime migration serving 6M+ daily transactions; scalable cross-cloud architecture supporting both AWS and Azure workloads, reduced operational costs by 40% through infrastructure consolidation.Technologies: Azure Databricks, AWS Databricks, Terraform, Unity Catalog, Delta Lake, Redshift (RA3), S3 Cross-Account Access, IAM AssumeRole, Photon Engine, PySpark, Azure Blob Storage - Couche-Tard Deutschland GmbH & Co. KGProject Data ManagerENERGIENovember 2023 - März 2025 (1 Jahr und 5 Monate)Berlin, DeutschlandManaged cloud data infrastructure, focusing on Redshift optimization, Unity Catalog implementation, and AWS Databricks. Led performance tuning initiatives reducing query execution times across Power BI workloads.Redshift Performance Engineering:
- Implemented distribution keys, sort/interleaved keys based on query pattern analysis
- Established Workload Management (WLM) queues with memory allocation optimization for concurrent Power BI refresh jobs and Pipeline workloads
- Configured RA3 node clusters with managed storage enabling independent compute/storage scaling
- Created monitoring views for table statistics, query performance, and skewness tracking
- Advised VACUUM strategies during pipeline runs, optimizing table performance
- Implemented automatic snapshot schedules for recovery scenarios
Databricks Platform Establishment:- Secured AWS Databricks access from parent organization through governance approval process
- Implemented Terraform modules for Unity Catalog setup with three-level namespace (catalog.schema.table)
- POCs for AWS Databricks adoption over Redshift, for performance and cost improvements, using Delta tables
- Built data pipelines for public and internal data sources, using Spark RDD parallel processing
- Built data quality monitoring completeness KPIs
AWS Infrastructure Management:- Managed subnet groups and CIDR block allocations for Redshift cluster isolation
- Configured cluster resize operations from dc2.large to RA3.xlplus nodes with minimal downtime
- Managed Power BI Gateway Data Sources, and Network Firewall Whitelisting b/w OnPrem and Cloud using FQDNs
BOXI to Cloud Data Integration:- Integrated SAP BOXI SFTP Exports with AWS Cloud, using AWS Transfer Family and Fingerprint calculation approach
Stack: AWS Redshift (RA3), Databricks, Unity Catalog, Terraform, WLM, Distribution Keys, Interleaved Sort Keys, Power BI, S3, Spark RDD, Python, Delta Lake
Bewertungen
Empfehlungen
Diese Freelancer passen auch zu Ihren Kriterien
Agatha Frydrych
Backend Java Software Engineer
4.7
(3)
2
Baptiste Duhen
Fullstack developer
4.6
(4)
5
Amed Hamou
Senior Lead Developer
4
(2)
7
Audrey Champion
Web developer
4.3
(3)
4
Zertifizierungen
- Certified Data Vault 2.0 PractitionerDatavaultalliance