Hi, I'm Ethan Liu.

Los Angeles · eliuusa@gmail.com

I am a software engineer / data scientist focused on AI tooling, RAG systems, and data-driven applications.

Experience

Software Engineer
Tone Software Corporation · Anaheim, CA
June 2025 - Present
  • Built an MCP server backed by locally hosted LLMs to enable secure AI assistance for internal SugarCRM workflows.
  • Developed a Go-based CLI client that authenticates with SugarCRM APIs and exposes CRM actions/tools to the MCP server.
  • Created an internal packaging/deployment app to streamline building and shipping testable product builds for QA and stakeholder validation.
  • Built a cross-platform C++ command-line utility to parse PDFs and extract embedded TrueType/OpenType font files for analysis and reuse.
Software Engineer Intern
Tone Software Corporation · Anaheim, CA
December 2024 - June 2025
  • Implemented a RAG pipeline for internal documentation and code search using Qdrant vector DB and E5-Large-V2 embeddings for semantic retrieval.
  • Prototyped and evaluated AI coding assistants by benchmarking code completion/refactor accuracy and latency on internal tasks.
  • Curated a training/retrieval dataset by cleaning and de-duplicating documents, normalizing formats, and authoring prompt templates for consistent results.
  • Provisioned and deployed coding/testing infrastructure across staging and production environments, improving reliability and repeatability of experiments.
Application Developer Intern
Siemens Digital Industries Software · Costa Mesa, CA
June 2023 - June 2025
  • Collected and converted website documentation and PDFs into RAG-ready text with chunking + metadata to support embeddings and retrieval.
  • Collaborated with a team to design the system architecture for a documentation Q&A assistant.
  • Implemented an end-to-end retrieval pipeline to generate embeddings, index content in a vector database, and return top-k relevant passages to ground model responses.
  • Integrated an AI assistance bot into an internal documentation website to improve interactivity and enable fast, accurate documentation lookup for users.

Education

M.S. in Data Science
University of Pennsylvania · Philadelphia, PA
Dec 2025 - Present
B.S. in Computer Science
California State University, Long Beach · Long Beach, CA
Aug 2021 - May 2025

Projects

Qwen-Image-Edit I2I LoRA
ML Project
  • Automated image collection and preprocessing for LoRA training, producing a reproducible dataset build pipeline.
  • Trained and monitored LoRA fine-tuning on a remote NVIDIA RTX 6000 GPU for 11k steps; tracked loss/outputs and tuned LR/batch settings for stability.
  • Added dataset QA checks (duplicate detection, resolution thresholds, and label/caption validation) to prevent low-quality samples from entering training.
  • Tools: Python, Docker, Selenium, PyTorch
Local RAG Chatbot
RAG Application
  • Built a local Retrieval-Augmented Generation (RAG) app using LangChain with Ollama and Qwen3 to enable offline/private Q&A.
  • Ingested and chunked documentation content, generated 1024-d embeddings, and stored vectors + source text in Azure Cosmos DB with vector search enabled.
  • Implemented similarity search + RAG chain to retrieve top-k relevant chunks from Cosmos DB and ground responses in retrieved context; supported local development via Cosmos DB Emulator in Docker.
  • Tools: Python, LangChain, Ollama, Azure Cosmos DB (Vector Search), Docker

Skills

Languages
  • Python
  • Go
  • C++
ML/AI
  • LLMs
  • RAG
  • LoRA
  • LangChain
  • PyTorch
  • Qwen3
  • E5-Large-V2
Data Platforms
  • Qdrant
  • Azure Cosmos DB (Vector Search)
  • SugarCRM
Tools
  • Docker
  • Selenium