You are currently viewing ThelerineAI: The journey

ThelerineAI: The journey

I’ve been playing around with building my own dataset to use to train my virtual try-on that I’ve decided to call ThelerineAI, named after the pleated skirt that is popular amongst Batswana and Basotho tribes. All this started when I decided to enhance my Computer Vision skills and the best way to learn anything is to put it into practice. I slowly embarked on this journey in 2024, but I picked up on it a bit more last year. To keep it fun for myself, I decided to work on a project that is relevant to me. I regularly shop for clothing online, and it bothered me that I couldn’t try on clothing online. I knew that I was not the only one, so I did a Google search in 2024 and came across a paper on a virtual try on that dated back to 2021, TryOnGAN: Body-Aware Try-On via Layered Interpolation. This paper references papers from 2019, so this topic has been around for quite a while.

I was interested in cutting edge technology, so I decided to reference papers that were based on the Stable Diffusion architecture such as OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person and TryOnDiffusion: A Tale of Two UNets.

And boy oh boy, it’s been a journey! Because I’m also sharpening my Computer Vision skills I decided to build my own from scratch. And now I am building my own dataset to use to train this model. And that is a monumental task on its own. Ja, I’m very ambitious!

I wanted to shy away from using existing datasets on the web. I’ve decided to call my dataset the Itekanye dataset. Itekanye is a Setswana word for “try-on”. I collected a few images of myself and of my husband’s (he gave his consent). A Stable Diffusion based virtual try-on requires a triplet dataset for training which consists of the original image (person wearing the garment), the garment image and a conditioning input to guide the network during image generation, particularly during training. I only had the original images, so I needed the other two components. I started with the idea of extracting the clothing garments from the original images, and that’s when I was introduced to the CIHP-PGN model.

I first started with trying to use models that are licensed for commercial use and ChatGPT incorrectly advised me to use the CIHP-PGN model. I later learned that the dataset and other aspects of the model are to be used for research purposes only. But I was already invested so I decided to carry on with it and I’ve had interesting results that I’ve documented here: https://github.com/kmafatshe/Itekanye1.0.

Leave a Reply