Data is undoubtedly one of the most valuable assets on Earth. Commonly referred to as the fourth industrial revolution, with the IDC expecting Big Data Analytics revenue to reach $274.3 billion this year.
Coming from a fashion background, with a fashion marketing degree and years of experience within the retail sector, the world of data was foreign to me. However, with the evolution of consumerism and broadening horizons of omnichannel marketing, I quickly learned that those who failed to embrace the growing role of technology within retail would be left behind.
So, I embraced it. I am now working as a Data Quality Specialist at EDITED – a market intelligence platform that produces real-time data analytics software intended for brands and retailers. My role includes monitoring the EDITED platform’s data accuracy; accurately quality checking the data within the Data Science team’s Machine Learning (ML) models and collecting new data to input into Data Science training sets. As someone who ventured into the industry with little knowledge, I initially found it daunting to grasp the concept of data science and machine learning. However, after two years of working at EDITED with the Data Science team, I wanted to use my unique perspective to help others with a similar, non-technical background, access and understand machine learning.
At EDITED we track over 1 billion products, helping customers analyze retail data to get their product assortment and price right. Within this there needs to be an all-encompassing, mutually-exclusive categorization structure for the products, so customers can analyze products with ease (mutually-exclusive means that the products within the app can only be allocated to one category within the structure without duplication). Machine learning models are the backbone of the taxonomy at EDITED (taxonomy refers to the classification of data into categories and subcategories etc.)
What Actually Is Machine Learning?
Listening to the EDITED podcast, ‘EDITED: Inside Retail’ Data Scientist Michael R gives insight into the common misconception that machine learning computers grow in intelligence over time. This is a big misunderstanding of artificial intelligence (AI) and machine learning, relating to the ideology that ML/AI corresponds to science fiction and robots taking over. Machine Learning is the process of teaching a machine (computer) to learn patterns to develop prediction methods. This is done by inputting a lot of data points – the more data points that are added, the more examples a machine can use to learn. The different types of machine learning models include (but not limited to):
The Different Types of Machine Learning Models:
Supervised: Supervised learning is the process of training the model on labeled data, which allows the model to predict the outcome. The model’s accuracy tends to scale with larger training data sets.
At EDITED, we use many supervised models, by collecting large amounts of labeled data to help the model predict where a product should fall in the app, for example, the difference between a skirt and a top, or perhaps, a shirt and a blouse. We are essentially teaching a model to understand concepts. If you want a model to classify different types of clothing, you need to teach it the concept of different clothing variations. For instance, what makes a dress a dress? We do this by showing lots of examples of a dress and over time it begins to learn about the underlying characteristics of what makes a dress. This will then enable the model to identify a dress it has never seen before.
Another excellent example of a supervised learning model that I gained from a wired.com video, was the use of ML in an email program. ML models are constantly working behind the scenes within your email inbox to help figure out if new mail is spam or not. Labels (e.g. spam, inbox, etc) are used here to indicate classes. A supervised model is trained on pre-labelled emails and is then able to predict which label should be assigned to new emails. A specific example would be that you are receiving many Spanish-speaking emails. As a non-Spanish speaker, your inbox emails contain no Spanish. The model has learned from supervised data that it’s unlikely you’d be expecting lots of Spanish emails in your inbox, so is able to assign these into the spam folder.
Unsupervised: Unsupervised learning uses unlabeled data, meaning the model learns patterns without tags (clustering). Here at EDITED, we have worked on various unsupervised learning projects. One is that we had a database of retailers, with lots of information about these brands and retailers, such as how many products they carry; their average price; market participation and more. Based on all those characteristics an unsupervised model would group these brands into three groups. These clusters would reflect some inherent common characteristics of the brands within them for example, average price, max price, and the number of accessories. Business specialists would later analyze the clusters and interpret them, by looking at the individual brands within the clusters and trying to find some meaning behind their grouping. You can see in the example diagram, we found a correlation behind the grouping of the retailers based on their market level for instance: value, mass, premium and luxury.
Unsupervised learning can also be used in market analysis, by noticing similar patterns in the customer’s online usage. For example, knowing what advertisement may attract a customer based on behavior patterns such as their scrolling and clicking decisions, or engaging with a social media advertisement of a particular clothes brand. Another example of this is TikTok; each individual’s content on their home page appears different, as the algorithm pulls through videos it believes the user would like to see, based on what they have previously engaged with.
Semi-supervised: Semi-supervised learning is simply a combination of supervised and unsupervised learning. It uses a small amount of labeled data and a large amount of unlabeled data. For example, at EDITED, if we couldn’t collect enough training data for a fully supervised categorization model, we might give it a small amount of labeled training data to get started and then leave it to generate its own training data based on that small subset.
This is just a brief introduction to machine learning. I have learned so much during my time at EDITED, yet there is still a lot more left for me to learn about the world of machine learning. As someone who came from a non-technical fashion background, my advice to anyone starting out in a similar position is to ask questions, stay open-minded to new possibilities (as tech can be ever-changing) and make the most of the resources around you! Working alongside data scientists and individuals with so much knowledge there has never been a better time for me to learn and develop skills I would never have imagined having.
Poppy is a Data Quality Specialist at EDITED.
Get in touch now to learn more about EDITED and see if your talents can help us make retail smarter.