Project 3, DSC80
This project is maintained by hunterbrownell
We are using the “Recipes” dataset which contains data on many different recipes. Our analysis aims to answer the question, “Do recipes that take more time tend to have higher ratings?” The purpose of this is to help users decide whether or not it would be worth it for them to spend more time making food, or just make something quick and easy. The data contains 83,782 rows which each correlate to a single recipe. We will be using the “rating” column which contains the average rating for each recipe. We will also use the “minutes” and “n_steps” columns which will tell us the time and number of steps each recipe takes.
For our data cleaning process, we started by replacing all the ratings of 0 in the reviews data with np.nan’s because a rating of 0 just meant the recipe was not rated. This helped stop recipes from receiving a lower average rating than they should have due to 0’s being factored in. Then we created a column with true/false depending on whether an entry was a rating or just an entry with a description. Then we grouped by the “recipe_id” column with aggfunc mean so that we had the average rating for each recipe which we could merge to the recipes data. Then we grouped the recipes data by “recipe_id” again but with aggfunc sum this time so we could get the total reviews for each recipe. Finally, we merged the two dataframes into one big dataframe which contained the relevant information to use in our analysis.
| name | id | minutes | contributor_id | submitted | tags | nutrition | n_steps | steps | description | ingredients | n_ingredients | rating | num_reviews |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 brownies in the world best ever | 333281 | 40 | 985201 | 2008-10-27 | ['60-minutes-or-less', 'time-to-make', 'course', 'main-ingredient', 'preparation', 'for-large-groups', 'desserts', 'lunch', 'snacks', 'cookies-and-brownies', 'chocolate', 'bar-cookies', 'brownies', 'number-of-servings'] | [138.4, 10.0, 50.0, 3.0, 3.0, 19.0, 6.0] | 10 | ['heat the oven to 350f and arrange the rack in the middle', 'line an 8-by-8-inch glass baking dish with aluminum foil', 'combine chocolate and butter in a medium saucepan and cook over medium-low heat , stirring frequently , until evenly melted', 'remove from heat and let cool to room temperature', 'combine eggs , sugar , cocoa powder , vanilla extract , espresso , and salt in a large bowl and briefly stir until just evenly incorporated', 'add cooled chocolate and mix until uniform in color', 'add flour and stir until just incorporated', 'transfer batter to the prepared baking dish', 'bake until a tester inserted in the center of the brownies comes out clean , about 25 to 30 minutes', 'remove from the oven and cool completely before cutting'] | these are the most; chocolatey, moist, rich, dense, fudgy, delicious brownies that you'll ever make…..sereiously! there's no doubt that these will be your fav brownies ever for you can add things to them or make them plain…..either way they're pure heaven! | ['bittersweet chocolate', 'unsalted butter', 'eggs', 'granulated sugar', 'unsweetened cocoa powder', 'vanilla extract', 'brewed espresso', 'kosher salt', 'all-purpose flour'] | 9 | 4 | 1 |
| 1 in canada chocolate chip cookies | 453467 | 45 | 1848091 | 2011-04-11 | ['60-minutes-or-less', 'time-to-make', 'cuisine', 'preparation', 'north-american', 'for-large-groups', 'canadian', 'british-columbian', 'number-of-servings'] | [595.1, 46.0, 211.0, 22.0, 13.0, 51.0, 26.0] | 12 | ['pre-heat oven the 350 degrees f', 'in a mixing bowl , sift together the flours and baking powder', 'set aside', 'in another mixing bowl , blend together the sugars , margarine , and salt until light and fluffy', 'add the eggs , water , and vanilla to the margarine / sugar mixture and mix together until well combined', 'add in the flour mixture to the wet ingredients and blend until combined', 'scrape down the sides of the bowl and add the chocolate chips', 'mix until combined', 'scrape down the sides to the bowl again', 'using an ice cream scoop , scoop evenly rounded balls of dough and place of cookie sheet about 1 - 2 inches apart to allow for spreading during baking', 'bake for 10 - 15 minutes or until golden brown on the outside and soft & chewy in the center', 'serve hot and enjoy !'] | this is the recipe that we use at my school cafeteria for chocolate chip cookies. they must be the best chocolate chip cookies i have ever had! if you don't have margarine or don't like it, then just use butter (softened) instead. | ['white sugar', 'brown sugar', 'salt', 'margarine', 'eggs', 'vanilla', 'water', 'all-purpose flour', 'whole wheat flour', 'baking soda', 'chocolate chips'] | 11 | 5 | 1 |
| 412 broccoli casserole | 306168 | 40 | 50969 | 2008-05-30 | ['60-minutes-or-less', 'time-to-make', 'course', 'main-ingredient', 'preparation', 'side-dishes', 'vegetables', 'easy', 'beginner-cook', 'broccoli'] | [194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0] | 6 | ['preheat oven to 350 degrees', 'spray a 2 quart baking dish with cooking spray , set aside', 'in a large bowl mix together broccoli , soup , one cup of cheese , garlic powder , pepper , salt , milk , 1 cup of french onions , and soy sauce', 'pour into baking dish , sprinkle remaining cheese over top', 'bake for 25 minutes or until cheese is lightly browned', 'sprinkle with rest of french fried onions and bake until onions are browned and cheese is bubbly , about 10 more minutes'] | since there are already 411 recipes for broccoli casserole posted to “zaar” ,i decided to call this one #412 broccoli casserole.i don't think there are any like this one in the database. i based this one on the famous “green bean casserole” from campbell's soup. but i think mine is better since i don't like cream of mushroom soup.submitted to “zaar” on may 28th,2008 | ['frozen broccoli cuts', 'cream of chicken soup', 'sharp cheddar cheese', 'garlic powder', 'ground black pepper', 'salt', 'milk', 'soy sauce', 'french-fried onions'] | 9 | 5 | 4 |
| millionaire pound cake | 286009 | 120 | 461724 | 2008-02-12 | ['time-to-make', 'course', 'cuisine', 'preparation', 'occasion', 'north-american', 'desserts', 'american', 'southern-united-states', 'dinner-party', 'holiday-event', 'cakes', 'dietary', 'christmas', 'thanksgiving', 'low-sodium', 'low-in-something', 'taste-mood', 'sweet', '4-hours-or-less'] | [878.3, 63.0, 326.0, 13.0, 20.0, 123.0, 39.0] | 7 | ['freheat the oven to 300 degrees', 'grease a 10-inch tube pan with butter , dust the bottom and sides with flour , and set aside', 'in a large mixing bowl , cream the butter and sugar with an electric mixer and add the eggs one at a time , beating after each addition', 'alternately add the flour and milk , stirring till the batter is smooth', 'add the two extracts and stir till well blended', 'scrape the batter into the prepared pan and bake till a cake tester or knife blade inserted in the center comes out clean , about 1 1 / 2 hours', 'cool the cake in the pan on a rack for 5 minutes , then turn it out on the rack to cool completely'] | why a millionaire pound cake? because it's super rich! this scrumptious cake is the pride of an elderly belle from jackson, mississippi. the recipe comes from “the glory of southern cooking” by james villas. | ['butter', 'sugar', 'eggs', 'all-purpose flour', 'whole milk', 'pure vanilla extract', 'almond extract'] | 7 | 5 | 1 |
| 2000 meatloaf | 475785 | 90 | 2202916 | 2012-03-06 | ['time-to-make', 'course', 'main-ingredient', 'preparation', 'main-dish', 'potatoes', 'vegetables', '4-hours-or-less', 'meatloaf', 'simply-potatoes2'] | [267.0, 30.0, 12.0, 12.0, 29.0, 48.0, 2.0] | 17 | ['pan fry bacon , and set aside on a paper towel to absorb excess grease', 'mince yellow onion , red bell pepper , and add to your mixing bowl', 'chop garlic and set aside', 'put 1tbsp olive oil into a saut pan , along with chopped garlic , teaspoons white pepper and a pinch of kosher salt', 'bring to a medium heat to sweat your garlic', 'preheat oven to 350f', 'coarsely chop your baby spinach add to your heated pan , stir frequently for approximately 5 min to wilt', 'add your spinach to the mixing bowl', 'chop your now cooled bacon , and add it to the mixing bowl', 'add your meatloaf mix to the bowl , with one egg and mix till thoroughly combined', 'add your goat cheese , one egg , 1 / 8 tsp white pepper and 1 / 8 tsp of kosher salt and mix till thoroughly combined', 'transfer to a 9x5 meatloaf pan , and cook for 60 min or until the internal temperature is at least 160f', 'let stand for 5min', 'melt 1tbsp unsalted butter into a frying pan , and cook up to three eggs at a time', 'crack each egg into a separate dish , in order to prevent egg shells from reaching the pan , then add salt and pepper to taste', 'wait until the egg whites are firm looking , but slightly runny on top before flipping your eggs', 'after flipping , wait 10~20 seconds before removing each egg and placing it over your slices of meatloaf'] | ready, set, cook! special edition contest entry: a mediterranean flavor inspired meatloaf dish. featuring: simply potatoes - shredded hash browns, egg, bacon, spinach, red bell pepper, and goat cheese. | ['meatloaf mixture', 'unsmoked bacon', 'goat cheese', 'unsalted butter', 'eggs', 'baby spinach', 'yellow onion', 'red bell pepper', 'simply potatoes shredded hash browns', 'fresh garlic', 'kosher salt', 'white pepper', 'olive oil'] | 13 | 5 | 2 |
This plot shows the relationship between the time (in minutes) it takes to make each recipe as well as the rating. Even though there are some outliers, we can generally see that recipes that take more time to make tend to have higher ratings.
| recipe_id | rating |
|---|---|
| 38 | 4.25 |
| 40 | 5 |
| 41 | 4 |
| 43 | 1 |
| 45 | 3 |
| 49 | 5 |
| 50 | 4 |
| 53 | 3 |
| 55 | 5 |
| 58 | 4.66667 |
This dataframe is grouped by recipe id with aggfun ‘mean’; it shows us the average rating for each recipe; the first 10 rows of the dataframe are shown.
The rating column was NMAR dependent on the number of steps; we determined this by running a permutation test comparing the ‘n_steps’ and the ‘rating’ columm. We obtained a p-value of 0.0 upon running the test 1000 timmes which told us that it is very unlikely that the data was missing due to random chance.
The results of this permutation test showed us that there was an extremely unlikely chance that the missingness of the rating column had nothing to do with the “n_steps” column; we obtained a p-value of 0.0 which meant that we should reject the null hypothesis.
The results of this permutation did not wield any significant results. We obtained a p_value of 0.305 which led us to conclude that it was unlikely the missingness of the “rating” column was related to the values in the “minutes” column. Because of our results we failed to reject the null hypothesis. (the graph looks bizarre but it is due to several massive outliers)
For our work, our null hypothesis was that “There is no relationship between how long a recipe takes to complete and its average rating”. Our alternative hypothesis was “If a recipe takes longer than 1000 minutes, it is more likely to have a higher average rating”. We used the difference in means as our test statistic and set a significance level of 0.05. The resulting p_value for our test was 0.14 which meant we failed to reject the null. We decided that the difference in means of ratings was a good test statistic because we were trying to decide whether or not it was worth it to spend more time in the kitchen. If we used to absolute difference for example, we could have ended up with data that supported our alternative hypothesis when in reality it was the faster recipes that were higher.