Portrait of Ethan Liu

// Data Science & Applied AI

Hello,

I'm Ethan Liu

Data science and AI portfolio focused on RAG systems, AI tooling, local LLM workflows, and production-minded machine learning applications.

I have built retrieval pipelines, vector search systems, local LLM workflows, model-assisted tooling, and production-minded AI applications that teams can use in real engineering environments.

Focus AI Tooling
Stack Python, Go, C++
Specialty RAG + MCP Systems

Profile

About Me

I build production-minded AI systems for engineering teams. My work tends to sit at the seam between model capability and actual utility: retrieval pipelines, local LLM workflows, internal tools, packaging systems, and developer interfaces that teams can rely on.

Across Tone Software and Siemens, I have worked on documentation ingestion, vector search, evaluation of AI coding tools, local LLM integrations, and the infrastructure needed to make those systems usable in real environments.

I prefer end-to-end ownership: shaping the data path, wiring APIs and storage, and tightening the user experience until the system becomes genuinely helpful.

Location Los Angeles, CA
Current Focus AI tooling, RAG, local-first workflows
Interests Developer UX, search quality, applied ML systems

Timeline

Experience

TS

Software Engineer @ Tone Software Corporation

Building internal AI and developer tooling across CRM workflows, packaging systems, and low-level utilities.

  • Built an MCP server backed by locally hosted LLMs for secure internal SugarCRM assistance.
  • Developed a Go CLI that authenticates against SugarCRM APIs and exposes CRM tools to the MCP layer.
  • Created an internal packaging and deployment app for QA-ready product builds.
  • Built a cross-platform C++ utility to extract embedded font files from PDFs.
Go C++ MCP Local LLMs SugarCRM
TS

Software Engineer Intern @ Tone Software Corporation

Focused on retrieval quality, infrastructure, and evaluation workflows for internal AI development.

  • Implemented a RAG pipeline for documentation and code search using Qdrant and E5-Large-V2 embeddings.
  • Prototyped and evaluated AI coding assistants by benchmarking code completion and refactor accuracy.
  • Curated training and retrieval datasets through cleanup, deduplication, and prompt standardization.
  • Provisioned and deployed coding and testing infrastructure across staging and production environments.
Python Qdrant Embeddings Evaluation
SD

Application Developer Intern @ Siemens Digital Industries Software

Worked on the data and retrieval foundations for an internal documentation assistant.

  • Collected and converted website documentation and PDFs into RAG-ready text with chunking and metadata.
  • Collaborated on system architecture for a documentation Q&A assistant.
  • Implemented embedding generation, vector indexing, and top-k retrieval for grounded responses.
  • Integrated an AI assistant bot into an internal documentation website.
RAG Vector Search Docs UX LLM Integration

Academic

Education

Dec 2025 - Present

M.S. in Data Science

University of Pennsylvania

Philadelphia, PA
Aug 2021 - May 2025

B.S. in Computer Science

California State University, Long Beach

Long Beach, CA

Selected Work

Projects

LoRA

Qwen-Image-Edit I2I LoRA

ML Project

2025
  • Automated image collection and preprocessing for reproducible LoRA training datasets.
  • Trained and monitored LoRA fine-tuning on a remote NVIDIA RTX 6000 GPU for 11k steps.
  • Added QA checks for duplicate detection, resolution thresholds, and caption validation.
Python Docker Selenium PyTorch
RAG

Local RAG Chatbot

RAG Application

2025
  • Built a local Retrieval-Augmented Generation app using LangChain, Ollama, and Qwen3.
  • Ingested documentation, generated 1024-d embeddings, and stored vectors in Azure Cosmos DB.
  • Implemented grounded top-k retrieval and local development through Cosmos DB Emulator in Docker.
LangChain Ollama Qwen3 Azure Cosmos DB Docker

Capabilities

Skills

Languages

  • Python
  • Go
  • C++

AI / ML

  • LLMs
  • RAG
  • LoRA
  • LangChain
  • PyTorch
  • Qwen3
  • E5-Large-V2

Data Platforms

  • Qdrant
  • Azure Cosmos DB Vector Search
  • SugarCRM

Tools

  • Docker
  • Selenium
  • MCP
  • Packaging and Release Tooling

Reach Out

Contact

If you are building AI-enabled internal tools, retrieval systems, or developer workflows, I am open to conversations about the engineering problems behind them.