Lui Chu — MLOps & Machine Learning Engineer

About Me

I am an MLOps Engineer and AWS Certified Solutions Architect with a strong foundation in both Machine Learning and Production Infrastructure. Currently pursuing M.S. in Computer Science at Fairleigh Dickinson University, I specialize in building end-to-end ML systems that seamlessly integrate model development, deployment automation, and production monitoring.

My expertise centers on ML infrastructure engineering: designing serverless inference APIs with auto-scaling (AWS Lambda, DynamoDB), orchestrating distributed training jobs with fault tolerance (Go-based job engines), and implementing reproducible ML environments using Infrastructure as Code (Terraform). I've architected systems handling high-concurrency model serving with sub-millisecond latency and built resilient job queues optimized for large-scale ML workloads.

With professional experience at HiTrust Inc. developing production microservices and optimizing data pipelines (30% performance improvement), plus hands-on ML work at Astra Technology (time-series forecasting, Computer Vision PoCs), I bring a unique combination of ML understanding and infrastructure expertise. I'm passionate about creating robust, scalable systems that empower data scientists to deploy models confidently in production.

Featured Projects

MLOps infrastructure, ML model serving platforms, and distributed ML training systems

Chainy | Serverless ML Inference Infrastructure

Production

MLOps-ready serverless platform architected for deploying and serving ML models at scale. Built event-driven inference APIs using AWS Lambda with auto-scaling for variable load, DynamoDB for low-latency prediction logging, and comprehensive monitoring. 100% Infrastructure as Code with Terraform managing isolated Dev/Staging/Prod environments. Designed to handle high-concurrency model predictions with sub-100ms latency.

ML Model Serving AWS Lambda DynamoDB Terraform IaC CI/CD Monitoring

🌐 Live Demo 📁 GitHub Repo

Raft-Recovery | Distributed ML Training Orchestrator

Phase 1 Complete

Fault-tolerant job orchestration system engineered in Go for managing large-scale ML training workloads and distributed inference pipelines. Leverages Goroutines and Channels for concurrent job execution with thread safety, critical for parallelized hyperparameter tuning and batch predictions. Implements Write-Ahead Log (WAL) for crash recovery, ensuring long-running training jobs survive node failures. Ideal for orchestrating multi-hour model training across distributed clusters.

ML Job Orchestration Go Concurrency Fault Tolerance Distributed Training WAL Recovery

📚 Documentation 📁 GitHub Repo

End-to-End MLOps Pipeline with Model Registry

In Progress

Complete ML lifecycle automation from training to production deployment. Building automated pipeline with data versioning (DVC), experiment tracking (MLflow), model registry with versioning, and CI/CD for models. Features automated model testing, A/B deployment strategies, and drift detection. Deploys models as FastAPI endpoints on AWS Lambda with Terraform IaC. Includes comprehensive monitoring dashboards for model performance metrics, prediction latency, and data distribution shifts.

MLflow DVC Model Registry FastAPI AWS Lambda Drift Detection A/B Testing

📚 Documentation (Coming Soon)

# MLOps Infrastructure Architecture
resource
"aws_lambda_function" "ml_inference" {
         # Serverless model serving with auto-scaling
 function_name
 = "model-inference-api" runtime
 = "python3.11" memory_size
 = 3008# Optimized for ML inference
 environment
 = {
           MODEL_VERSION
 = "v1.2.3"   MLFLOW_TRACKING_URI
 = "s3://mlflow-artifacts"
        }
        }
        
resource
"aws_dynamodb_table" "predictions_log" {
         # Store predictions for monitoring & retraining
 name
 = "ml-predictions" billing_mode
 = "PAY_PER_REQUEST" ttl
 = { enabled = true, attribute_name = "expiry" }
        }
        
MLflow
"experiment-tracking" {
         # Model versioning & registry
 Backend
: "PostgreSQL RDS" Artifacts
: "S3 bucket" Registry
: "Model versioning + staging"
        }
        
# Cost: ~$5-15/month (Lambda + DynamoDB + RDS micro)

Technical Expertise

MLOps Engineering • ML Infrastructure • Model Deployment & Monitoring • Cloud Architecture

🤖 MLOps & ML Engineering

Model Deployment & Serving ML Pipeline Orchestration MLflow (Tracking & Registry) Python (Pandas, NumPy, Scikit-learn) Experiment Tracking Model Monitoring & Drift Detection A/B Testing for Models

☁️ Cloud Infrastructure & IaC

AWS (Lambda, SageMaker, DynamoDB, S3, EC2) Terraform (Multi-env IaC) Docker & Kubernetes Serverless Architecture CI/CD (GitHub Actions) Infrastructure Monitoring

🔧 Backend & Distributed Systems

Python (FastAPI, async/await) Go (Concurrency, Distributed Systems) RESTful APIs & Microservices Event-Driven Architecture PostgreSQL, DynamoDB, Redis Message Queues (Kafka)

🏆 Certifications

✅ AWS Solutions Architect – Associate ✅ HashiCorp Terraform Associate

Professional Experience

Building scalable backend systems and cloud infrastructure

HiTrust, Inc. | Software Engineer (Backend Focused)

Jan 2023 – Dec 2024

• Developed secure microservices for a financial transaction system handling millions of requests, ensuring high availability (HA) for critical live services
• Optimized database queries and backend logic, accelerating report generation by 30% for data-heavy workloads
• Dockerized applications and orchestrated deployments on Kubernetes, managing resource allocation and scaling strategies for production environments

Microservices Kubernetes Docker High Availability Performance Optimization

Astra Technology | Product Planner (Data & ML Focus)

Oct 2018 – Dec 2019

• Utilized Python (Pandas) to process internal datasets and build time-series prediction models to forecast user behavior patterns
• Collaborated with engineering teams to launch a Proof of Concept (PoC) for an AI-driven Computer Vision project (NTT Japan)
• Defined technical requirements for ML model deployment and integration with production systems

Python Pandas ML Models Computer Vision PoC Development

Education

Fairleigh Dickinson University

2025 – 2027

M.S. in Applied Computer Science

Relevant Coursework:
• Artificial Intelligence (Python)
• Advanced Topics in Operating Systems (Go)
• Systems Programming (C)

AI/ML Operating Systems Systems Programming

Institute for Information Industry

2017 – 2018

Big Data Analytics Bootcamp

Intensive training in data analytics, machine learning, and big data technologies.
Hands-on projects with real-world datasets and ML model development.

Big Data Data Analytics Machine Learning

Let's Connect

Actively seeking MLOps Engineer and ML Infrastructure roles in Vancouver, BC | Open to remote opportunities

📧 Email 🐙 GitHub 💼 LinkedIn

Hi, I'm Li-Yu Chu

MLOps Engineer | AWS Certified Solutions Architect