After earning my PhD at UNC Chapel Hill, I thought I had found what I wanted: a data-rich environment in which to utilize my epidemiology skillset and apply data-driven findings to the real world. As a scientist at the New York City department of health and mental hygiene, my findings could eventually make their way to policymakers and administrators charged with the City’s health. However, I soon learned what any government insider would tell you, the bureaucratic, slow pace of government was incredibly frustrating and antithetical to my strong desire to see ready deployment of data-driven insights in real time.
That is what drove me to the world of data science. I knew there was no shortage of interesting real-world problems, but for the first time in history humanity now had the computing power needed to develop wicked algorithms to solve them, and the storage necessary to reliably accrue troves of data for these purposes.
I charted my way into data science through the Insight Data Science program. I joined the league of other recovering PhDs who were excited to apply scientific thinking to challenges outside of academia and conventional research. The program directors encouraged us to build an app that filled a business’ or consumers’ need. So I identified a problem that I’ve faced as a consumer. As someone who has spent hours in supermarkets reading ingredients lists and making sure I am comfortable with the formulations, I wanted to alleviate the needs of consumers who had recently made a
\#FoodGoals
The Why\#FoodGoals is the first recommendation system that I built as a data scientist. I built it because I had to 🤣 all Insight Data Science Fellows are required to complete a data science project. I might have designed another project but this particular topic solved an issue I myself had: I wanted to help consumers make faster decisions when trying to find groceries that are aligned with their nutritional \#Foodgoals. As a consumer myself, I have spent too much time scrutinizing ingredients to make sure they suit my nutritional needs. Moreover, with the rising popularity of online grocery shopping (yes, I realized even in 2019 that this trend was here to stay), such a project could be easily integrated into grocery platforms.
The HowI went searching through the internet for ingredient listings for groceries. I found the USDA’s trove of branded food products and ingredients (and nutrient\!) data, this was exactly what I needed to get started. I utilized pandas for data cleaning and characterizing nutrient density of products within their appropriate food categories.
I then became familiar with natural language processing (NLP) techniques as I combed through ingredient listings, removing uninformative words (e.g. salt, flavor, concentrate) and drilling down to base ingredients even when other multi-ingredient items were listed (e.g. if ketchup was listed as an ingredient in a sauce, I would extract tomato, sugar, vinegar and ignore ketchup). After this, I utilized word2vec- a neural network that learns associations from the co-occurence of words in a sentence (or in my case, ingredient lists) together, and projects those words into vector space. Thus, this neural network can find related ingredients---and related grocery items based on those ingredients. Finally, I determined the nearness of individual product vectors to each by calculating cosine similarity.
Having studied nutrition, I knew individuals frequently changed their nutritional goals but I started with a standard slate of options for the potential consumer \- for any given product, one could search for recommendations on a low calorie, low carb, low-sugar, high protein or low fat version.
The WhatAnd that is how \#Foodgoals emerged. This application allows a consumer to request recommendations for grocery items that are similar to one whose ingredient profile they like. While this app is no longer being hosted live, you may view the slides and video of how it operated below.