Ganasai Palakurthi | A Software Engineer residing in United States |

I SPECIALIZE IN /

DATA PIPELINE ENGINEERING & ETL DEVELOPMENT
CLOUD DATA PLATFORMS & DATA WAREHOUSING
BIG DATA PROCESSING & ANALYTICS SYSTEMS

DATA PIPELINE ENGINEERING & ETL DEVELOPMENT

I design and build scalable ETL/ELT pipelines to ingest, process, and transform large-scale data from multiple sources into analytics-ready datasets.

Batch and near real-time ingestion (Snowpipe, COPY INTO)
Data transformation using SQL and PySpark
Workflow orchestration using Apache Airflow
Data quality validation and monitoring

CLOUD DATA PLATFORMS & DATA WAREHOUSING

I build cloud-native data platforms using Snowflake and AWS, enabling efficient storage, transformation, and analytics of large datasets.

Snowflake (External Tables, Streams & Tasks, RBAC)
AWS S3 data lake architecture
Medallion Architecture (Bronze/Silver/Gold)
Data modeling (Fact & Dimension tables)

BIG DATA PROCESSING & ANALYTICS SYSTEMS

I process large-scale structured and semi-structured data using distributed computing frameworks to support analytics and reporting.

Apache Spark / PySpark / Databricks
Handling JSON, Parquet, and semi-structured data
Performance optimization and partitioning strategies
Scalable data processing workflows

Data Engineering Skills /

Data Engineering

ETL / ELT Pipelines
Data Ingestion, Transformation, and Integration
Batch and Near Real-Time Processing
Data Quality Validation and Reconciliation
Metadata-Driven Pipelines

Data Warehousing

Snowflake
Snowpipe, COPY INTO, External Tables
Streams & Tasks
Clustering, Query Optimization, Virtual Warehouses
RBAC and Secure Data Access

Big Data & Processing

Apache Spark
PySpark
Databricks
Spark SQL
Distributed Data Processing

Cloud Platforms

AWS (S3, EMR, Glue, Lambda, Redshift, IAM, VPC, CloudWatch)
Cloud Data Lake and Storage Architecture
Data Processing on AWS
GCP (BigQuery, Cloud Storage, Dataproc)

Orchestration & Workflow

Apache Airflow
MWAA
Control-M
Workflow Scheduling and Dependency Management

Streaming & Messaging

Apache Kafka
Spark Streaming
Real-Time Data Pipelines
Event-Driven Data Processing

Programming & Query Languages

Python
SQL
Snowflake SQL
Spark SQL
Bash

Data Modeling & Architecture

Star Schema and Snowflake Schema
Fact and Dimension Modeling
Medallion Architecture (Bronze / Silver / Gold)
Analytics-Ready Data Design

Monitoring & Observability

AWS CloudWatch
Splunk
Dynatrace
Pipeline Monitoring and Alerting
Job Failure Analysis and Root Cause Investigation

DevOps & Supporting Tools

Git, GitHub
Jenkins, GitHub Actions
Docker, Kubernetes (EKS)
Terraform

Projects

Metadata-Driven ELT Pipeline in Snowflake

Designed and implemented a scalable metadata-driven ELT pipeline using Snowflake, Amazon S3, and GitHub Actions. Built a reusable stored procedure to dynamically generate SQL based on configuration tables, enabling onboarding of new pipelines without code changes. Implemented Bronze-to-Silver transformations, audit logging for pipeline observability, and CI/CD automation for deployment and execution.

Technologies: Snowflake, SQL (Snowflake Scripting), Amazon S3, GitHub Actions, Snowflake CLI, Data Engineering

View on GitHub Medium Article

End-to-End Snowflake Data Warehouse (Medallion Architecture)

Designed and implemented a production-style data warehouse using Snowflake following the Medallion Architecture (Bronze, Silver, Gold). Built secure ingestion from Amazon S3, performed data profiling and transformation, and developed a star schema with fact and dimension tables along with business-ready marts for analytics.

Technologies: Snowflake, SQL, Amazon S3, Data Warehousing, Medallion Architecture

View on GitHub Medium Article

Provisioning an AWS EMR Cluster using Terraform

Automated the setup of an Amazon EMR cluster for big data processing using Apache Spark and Hadoop. Used AWS CLI and Python (Boto3) to provision resources, integrate with S3, and optimize for cost and scalability.

Technologies: AWS EMR, Apache Spark, Hadoop, Python (Boto3), AWS CLI, S3

View on GitHub Medium Article

Provisioning EKS with Self-Managed Node Groups using Terraform

Automates the deployment of Amazon EKS clusters with self-managed node groups using Terraform. Enables scalable, customizable Kubernetes infrastructure with full control over worker nodes.

Technologies: Terraform, AWS EKS, Kubernetes, IAM

View on GitHub Medium Article

Personal Portfolio Website

A responsive, modern portfolio website showcasing my experience, projects, certifications, and contact information. Built with a focus on clean UI, accessibility, and performance, and deployed with CI/CD on Netlify.

Technologies: HTML, CSS, JavaScript, Netlify, Formspree

GitHub (Private — Request Access) Live Demo

Certifications

Certified Kubernetes Application Developer

SnowPro® Associate: Platform

Academy Accreditation - Databricks Fundamentals

AWS Certified Machine Learning - Specialty

GCP Professional Data Engineer

HashiCorp Certified: Terraform Associate (003)

AWS Certified Solutions Architect - Associate

AWS Certified Developer – Associate

AWS Certified Cloud Practitioner

AWS Certified AI Practitioner - Beta

AWS Cloud Quest: Cloud Practitioner

Microsoft Certified: Security, Compliance, and Identity Fundamentals

GitHub Foundations

TigerGraph for ML

TigerGraph Associate

Apache Cassandra 3 Developer Certification

Apache Cassandra 3 Administrator Certification

Experience & Education

💼

Jul 2025 – Present

Data Engineer

Company: Technohaul LLC, Texas, USA

Designed and deployed ETL pipelines processing 8–12TB of data daily using Snowflake and AWS S3, reducing ingestion time from 6+ hours to under 1 hour
Built hybrid ingestion pipelines using Snowpipe and COPY INTO, enabling near real-time data availability with low latency
Developed scalable data transformation workflows using SQL and PySpark for analytics-ready datasets
Designed dimensional data models (fact and dimension tables) to support BI and reporting workloads
Orchestrated 100+ workflows using Apache Airflow, improving pipeline automation and reliability
Optimized Snowflake queries and warehouse usage, reducing compute costs by ~30%
Implemented data quality validation framework across multiple datasets, improving data reliability to >99%

Technologies: Snowflake, AWS (S3, EMR), PySpark, SQL, Apache Airflow, Databricks, Python

💼

Jan 2025 – May 2025

Data Engineer Intern

Company: Relinetek LLC, Texas, USA

Developed ETL pipelines ingesting data from AWS S3 into Snowflake staging layers for structured and semi-structured data
Built SQL-based transformation logic to cleanse, standardize, and enrich datasets
Organized data into Bronze/Silver/Gold layers following medallion architecture principles
Performed data validation and reconciliation checks to ensure consistency between source and target systems
Assisted in dimensional modeling for reporting and analytics use cases

Technologies: Snowflake, AWS S3, SQL, Python, Data Modeling

🎓

Jan 2024 – May 2025

Master of Science in Computer Science

University: Kent State University, Kent, OH

Focused on distributed systems, data engineering, databases, and cloud computing. Built strong foundations in scalable system design, data processing, and backend engineering.

💼

Mar 2021 – Dec 2023

Application Developer (Data Engineering – Snowflake)

Company: Accenture, Hyderabad, India

Designed and maintained ETL pipelines processing large-scale datasets across enterprise systems
Built ingestion pipelines using Snowpipe and COPY INTO for batch and near real-time data loading from AWS S3
Developed complex SQL transformations, stored procedures, and views for data standardization and enrichment
Processed semi-structured data (JSON) using VARIANT and FLATTEN functions in Snowflake
Implemented incremental pipelines using Streams and Tasks, improving data freshness and reducing processing time
Designed medallion architecture (Bronze/Silver/Gold) for scalable data processing
Orchestrated workflows using Apache Airflow, managing large-scale pipeline execution
Optimized query performance, reducing execution time by up to 50%
Implemented data quality frameworks, detecting and preventing anomalies before production

Technologies: Snowflake, AWS S3, SQL, Airflow, Python, Spark, Data Modeling

💼

Jan 2020 – Feb 2021

Associate Software Engineer

Company: Accenture, Bengaluru, India

Supported ETL workflows and data ingestion pipelines for enterprise data systems
Developed SQL queries and transformation scripts for data processing and reporting
Performed data validation and reconciliation checks to ensure data integrity
Assisted in maintaining data pipelines and resolving production issues

Technologies: SQL, AWS, Data Pipelines, ETL

🎓

June 2015 - May 2019

Bachelor of Technology in Electronics and Communication Engineering

University: Jawaharlal Nehru Technological University, Kakinada

Built strong foundations in data structures, databases, operating systems, and computer networks, enabling transition into data engineering and cloud systems.

About Me

I’m Ganasai Palakurthi, a Data Engineer with experience building scalable data pipelines, cloud data platforms, and analytics-ready data solutions using Snowflake, AWS, Spark, and Databricks.

My experience includes data ingestion, ETL/ELT development, data transformation, and designing data architectures that support large-scale structured and semi-structured datasets. I have worked on building reliable pipelines, optimizing data processing performance, and delivering curated datasets for reporting and analytics.

I focus on developing efficient data workflows, improving data quality, and building scalable systems that support business intelligence and data-driven decision-making. I am also committed to continuous learning and hold multiple industry certifications in cloud, data engineering, and platform technologies.

Contact Me

Email: ganasaipalakurthi@gmail.com

Phone: +1 (918) 928-9899

Location: United States

Hello!

I'm GANASAI PALAKURTHI

I SPECIALIZE IN /

DATA PIPELINE ENGINEERING & ETL DEVELOPMENT

CLOUD DATA PLATFORMS & DATA WAREHOUSING

BIG DATA PROCESSING & ANALYTICS SYSTEMS

Data Engineering Skills /

Data Engineering

Data Warehousing

Big Data & Processing

Cloud Platforms

Orchestration & Workflow

Streaming & Messaging

Programming & Query Languages

Data Modeling & Architecture

Monitoring & Observability

DevOps & Supporting Tools

Projects

Metadata-Driven ELT Pipeline in Snowflake

End-to-End Snowflake Data Warehouse (Medallion Architecture)

Provisioning an AWS EMR Cluster using Terraform

Provisioning EKS with Self-Managed Node Groups using Terraform

Personal Portfolio Website

Certifications

Experience & Education

Data Engineer

Data Engineer Intern

Master of Science in Computer Science

Application Developer (Data Engineering – Snowflake)

Associate Software Engineer

Bachelor of Technology in Electronics and Communication Engineering

About Me

Contact Me