Professional Work & Projects
Designed and built a production-grade AI orchestration framework that automates game feature development using coordinated LLM agents. The system transforms natural language feature requests into fully implemented Godot 4 game assets, GDScript files, scene hierarchies, and resources, through a structured multi-stage pipeline with validation, checkpointing, and human-in-the-loop approval gates.
Treated AI code generation as a pipeline architecture problem, not a prompt engineering problem. Each stage has defined inputs, outputs, and validation criteria. The system maintains execution state across sessions, supports deterministic replay for debugging, and produces human-readable artifacts at every step for full transparency.
Model selection is task-aware: GPT-4 handles planning and structured reasoning; Claude handles longer code generation where context and coherence matter. The abstraction layer enables hot-swapping models per pipeline stage.
Built and fine-tuned text-to-image AI models powering character asset generation for a blockchain-integrated mobile game. Owned the full ML pipeline, from dataset creation and LoRA fine-tuning to model evaluation and production integration, across 8+ distinct character communities.
Fine-tuned Flux-based and LoRA models using Hugging Face workflows, with hyperparameter tuning, debug batch generation, and failure mode isolation. Built evaluation protocols that tested identity accuracy, pose consistency, and style coherence across hundreds of generated samples. Integrated AI outputs into the mobile app pipeline, validating avatar rendering, wearables, and NFT-linked assets.
I worked on fine-tuning a ChatGPT model for Tactician TM, a turn-based game engine. The goal was to make it easier for anyone to describe game rules, and then our NLP model would tidy those up into a clear, standardized format. This was my first time facing a challenge like this, let's just say it was a rough start. To make it more approachable, I focused on the game of Tic Tac Toe (TTT).
Initially, I was overwhelmed. Nonetheless, I embraced the challenge, realizing I had nothing to lose. My first choice was Python's NLP tool, "spaCy". My original plan was to deconstruct each input sentence word by word. However, I soon realized this approach was too time-consuming, considering the multitude of variables involved. So, I returned to the drawing board and conducted further research.
During this phase, I discovered fine-tuning and various AI tools that facilitate this process. Notably, I found that ChatGPT offered fine-tuning capabilities. Initially, I used the "davinci-002" model for its accessibility. This model required data in a JSONL file, formatted in a prompt-completion structure.
I aimed for a 'waterfall effect' in my model, just as water in a river inevitably flows to a common destination, my model was designed to generalize any input into one of the standard TTT rules. This approach ensures consistency in interpreting diverse inputs and aligning them with established rule sets.
Despite initial efforts, a meeting with my mentor revealed a significant oversight: my model was overfitted for TTT. Their guest advised me to use newer ChatGPT models and leverage existing ChatGPT data, reducing the need for extensive datasets. He introduced me to "few-shot learning".
This invaluable advice led me to rethink my approach. Instead of creating an extensive dataset from scratch, I explored ways to utilize existing data. Through carefully constructed sentences, I was able to get the model's training loss down from 1.81 to 0.26. Following this advice, I selected ChatGPT's "gpt-3.5-turbo-1106" model for my final fine-tuning.