Hello!

I'm GANASAI PALAKURTHI

Data Engineer building scalable data pipelines, cloud data platforms, and analytics systems on AWS, Snowflake, and Databricks.

5+ Years Experience | Data Engineer | Snowflake • AWS • Databricks • Spark • ETL/ELT • Data Modeling • Airflow

Let’s Talk

Actively Seeking Full-Time Opportunities

I SPECIALIZE IN /

  • DATA PIPELINE ENGINEERING & ETL DEVELOPMENT
  • CLOUD DATA PLATFORMS & DATA WAREHOUSING
  • BIG DATA PROCESSING & ANALYTICS SYSTEMS

DATA PIPELINE ENGINEERING & ETL DEVELOPMENT

I design and build scalable ETL/ELT pipelines to ingest, process, and transform large-scale data from multiple sources into analytics-ready datasets.

  • Batch and near real-time ingestion (Snowpipe, COPY INTO)
  • Data transformation using SQL and PySpark
  • Workflow orchestration using Apache Airflow
  • Data quality validation and monitoring

CLOUD DATA PLATFORMS & DATA WAREHOUSING

I build cloud-native data platforms using Snowflake and AWS, enabling efficient storage, transformation, and analytics of large datasets.

  • Snowflake (External Tables, Streams & Tasks, RBAC)
  • AWS S3 data lake architecture
  • Medallion Architecture (Bronze/Silver/Gold)
  • Data modeling (Fact & Dimension tables)

BIG DATA PROCESSING & ANALYTICS SYSTEMS

I process large-scale structured and semi-structured data using distributed computing frameworks to support analytics and reporting.

  • Apache Spark / PySpark / Databricks
  • Handling JSON, Parquet, and semi-structured data
  • Performance optimization and partitioning strategies
  • Scalable data processing workflows

Data Engineering Skills /

Data Engineering

  • ETL / ELT Pipelines
  • Data Ingestion, Transformation, and Integration
  • Batch and Near Real-Time Processing
  • Data Quality Validation and Reconciliation
  • Metadata-Driven Pipelines

Data Warehousing

  • Snowflake
  • Snowpipe, COPY INTO, External Tables
  • Streams & Tasks
  • Clustering, Query Optimization, Virtual Warehouses
  • RBAC and Secure Data Access

Big Data & Processing

  • Apache Spark
  • PySpark
  • Databricks
  • Spark SQL
  • Distributed Data Processing

Cloud Platforms

  • AWS (S3, EMR, Glue, Lambda, Redshift, IAM, VPC, CloudWatch)
  • Cloud Data Lake and Storage Architecture
  • Data Processing on AWS
  • GCP (BigQuery, Cloud Storage, Dataproc)

Orchestration & Workflow

  • Apache Airflow
  • MWAA
  • Control-M
  • Workflow Scheduling and Dependency Management

Streaming & Messaging

  • Apache Kafka
  • Spark Streaming
  • Real-Time Data Pipelines
  • Event-Driven Data Processing

Programming & Query Languages

  • Python
  • SQL
  • Snowflake SQL
  • Spark SQL
  • Bash

Data Modeling & Architecture

  • Star Schema and Snowflake Schema
  • Fact and Dimension Modeling
  • Medallion Architecture (Bronze / Silver / Gold)
  • Analytics-Ready Data Design

Monitoring & Observability

  • AWS CloudWatch
  • Splunk
  • Dynatrace
  • Pipeline Monitoring and Alerting
  • Job Failure Analysis and Root Cause Investigation

DevOps & Supporting Tools

  • Git, GitHub
  • Jenkins, GitHub Actions
  • Docker, Kubernetes (EKS)
  • Terraform

Projects

Metadata Driven ELT Pipeline in Snowflake

Metadata-Driven ELT Pipeline in Snowflake

Designed and implemented a scalable metadata-driven ELT pipeline using Snowflake, Amazon S3, and GitHub Actions. Built a reusable stored procedure to dynamically generate SQL based on configuration tables, enabling onboarding of new pipelines without code changes. Implemented Bronze-to-Silver transformations, audit logging for pipeline observability, and CI/CD automation for deployment and execution.

Technologies: Snowflake, SQL (Snowflake Scripting), Amazon S3, GitHub Actions, Snowflake CLI, Data Engineering

Snowflake Data Warehouse Project

End-to-End Snowflake Data Warehouse (Medallion Architecture)

Designed and implemented a production-style data warehouse using Snowflake following the Medallion Architecture (Bronze, Silver, Gold). Built secure ingestion from Amazon S3, performed data profiling and transformation, and developed a star schema with fact and dimension tables along with business-ready marts for analytics.

Technologies: Snowflake, SQL, Amazon S3, Data Warehousing, Medallion Architecture

Project 1

Provisioning an AWS EMR Cluster using Terraform

Automated the setup of an Amazon EMR cluster for big data processing using Apache Spark and Hadoop. Used AWS CLI and Python (Boto3) to provision resources, integrate with S3, and optimize for cost and scalability.

Technologies: AWS EMR, Apache Spark, Hadoop, Python (Boto3), AWS CLI, S3

Project 2

Provisioning EKS with Self-Managed Node Groups using Terraform

Automates the deployment of Amazon EKS clusters with self-managed node groups using Terraform. Enables scalable, customizable Kubernetes infrastructure with full control over worker nodes.

Technologies: Terraform, AWS EKS, Kubernetes, IAM

Project 3

Personal Portfolio Website

A responsive, modern portfolio website showcasing my experience, projects, certifications, and contact information. Built with a focus on clean UI, accessibility, and performance, and deployed with CI/CD on Netlify.

Technologies: HTML, CSS, JavaScript, Netlify, Formspree

Certifications

CKAD Badge

Certified Kubernetes Application Developer

SnowPro® Associate: Platform

SnowPro® Associate: Platform

Databricks Fundamentals Badge

Academy Accreditation - Databricks Fundamentals

AWS ML Specialty Badge

AWS Certified Machine Learning - Specialty

GCP Data Engineer Badge

GCP Professional Data Engineer

Terraform Associate Badge

HashiCorp Certified: Terraform Associate (003)

AWS Solutions Architect Badge

AWS Certified Solutions Architect - Associate

AWS Developer Badge

AWS Certified Developer – Associate

AWS Cloud Practitioner Badge

AWS Certified Cloud Practitioner

AWS AI Practitioner Badge

AWS Certified AI Practitioner - Beta

AWS Cloud Quest Badge

AWS Cloud Quest: Cloud Practitioner

Microsoft Security Badge

Microsoft Certified: Security, Compliance, and Identity Fundamentals

GitHub Badge

GitHub Foundations

TigerGraph ML Badge

TigerGraph for ML

TigerGraph Associate Badge

TigerGraph Associate

Cassandra Developer Badge

Apache Cassandra 3 Developer Certification

Cassandra Admin Badge

Apache Cassandra 3 Administrator Certification

Experience & Education

💼
Jul 2025 – Present

Data Engineer

Company: Technohaul LLC, Texas, USA

  • Designed and deployed ETL pipelines processing 8–12TB of data daily using Snowflake and AWS S3, reducing ingestion time from 6+ hours to under 1 hour
  • Built hybrid ingestion pipelines using Snowpipe and COPY INTO, enabling near real-time data availability with low latency
  • Developed scalable data transformation workflows using SQL and PySpark for analytics-ready datasets
  • Designed dimensional data models (fact and dimension tables) to support BI and reporting workloads
  • Orchestrated 100+ workflows using Apache Airflow, improving pipeline automation and reliability
  • Optimized Snowflake queries and warehouse usage, reducing compute costs by ~30%
  • Implemented data quality validation framework across multiple datasets, improving data reliability to >99%

Technologies: Snowflake, AWS (S3, EMR), PySpark, SQL, Apache Airflow, Databricks, Python

💼
Jan 2025 – May 2025

Data Engineer Intern

Company: Relinetek LLC, Texas, USA

  • Developed ETL pipelines ingesting data from AWS S3 into Snowflake staging layers for structured and semi-structured data
  • Built SQL-based transformation logic to cleanse, standardize, and enrich datasets
  • Organized data into Bronze/Silver/Gold layers following medallion architecture principles
  • Performed data validation and reconciliation checks to ensure consistency between source and target systems
  • Assisted in dimensional modeling for reporting and analytics use cases

Technologies: Snowflake, AWS S3, SQL, Python, Data Modeling

🎓
Jan 2024 – May 2025

Master of Science in Computer Science

University: Kent State University, Kent, OH

Focused on distributed systems, data engineering, databases, and cloud computing. Built strong foundations in scalable system design, data processing, and backend engineering.

💼
Mar 2021 – Dec 2023

Application Developer (Data Engineering – Snowflake)

Company: Accenture, Hyderabad, India

  • Designed and maintained ETL pipelines processing large-scale datasets across enterprise systems
  • Built ingestion pipelines using Snowpipe and COPY INTO for batch and near real-time data loading from AWS S3
  • Developed complex SQL transformations, stored procedures, and views for data standardization and enrichment
  • Processed semi-structured data (JSON) using VARIANT and FLATTEN functions in Snowflake
  • Implemented incremental pipelines using Streams and Tasks, improving data freshness and reducing processing time
  • Designed medallion architecture (Bronze/Silver/Gold) for scalable data processing
  • Orchestrated workflows using Apache Airflow, managing large-scale pipeline execution
  • Optimized query performance, reducing execution time by up to 50%
  • Implemented data quality frameworks, detecting and preventing anomalies before production

Technologies: Snowflake, AWS S3, SQL, Airflow, Python, Spark, Data Modeling

💼
Jan 2020 – Feb 2021

Associate Software Engineer

Company: Accenture, Bengaluru, India

  • Supported ETL workflows and data ingestion pipelines for enterprise data systems
  • Developed SQL queries and transformation scripts for data processing and reporting
  • Performed data validation and reconciliation checks to ensure data integrity
  • Assisted in maintaining data pipelines and resolving production issues

Technologies: SQL, AWS, Data Pipelines, ETL

🎓
June 2015 - May 2019

Bachelor of Technology in Electronics and Communication Engineering

University: Jawaharlal Nehru Technological University, Kakinada

Built strong foundations in data structures, databases, operating systems, and computer networks, enabling transition into data engineering and cloud systems.

About Me

I’m Ganasai Palakurthi, a Data Engineer with experience building scalable data pipelines, cloud data platforms, and analytics-ready data solutions using Snowflake, AWS, Spark, and Databricks.

My experience includes data ingestion, ETL/ELT development, data transformation, and designing data architectures that support large-scale structured and semi-structured datasets. I have worked on building reliable pipelines, optimizing data processing performance, and delivering curated datasets for reporting and analytics.

I focus on developing efficient data workflows, improving data quality, and building scalable systems that support business intelligence and data-driven decision-making. I am also committed to continuous learning and hold multiple industry certifications in cloud, data engineering, and platform technologies.

Ganasai Palakurthi

Contact Me

Email: ganasaipalakurthi@gmail.com

Phone: +1 (918) 928-9899

Location: United States