John L. Godlee

I’ve just finished an experiment where I tracked what meals I ate for just over a year. I did it for 460 days in total.

The original goal was to keep a reference to provide inspiration for what to cook when I can’t think of anything. I realised that my cooking and eating habits have changed since I first moved away from home about 10 year ago, and I have forgotten many of the dishes I used to cook.

A second goal arose when I started the project, which was to track how many meals I eat “out”, i.e. paid for somebody else to cook in a restaurant, fast food place, cafe etc. I was suspicious that I probably ate out more than I admitted, and cooking at home is a good way to save money and stay healthy.

Collecting the data was quite easy. When I was living in London I drew a table on paper and stuck it to the wall above the kitchen table. When I moved to Edinburgh and I started working in the office again I transitioned to using an excel spreadsheet kept on my Desktop, so it remained visible and I wouldn’t forget to fill it out.

The only times I found it difficult to keep the log was when I was travelling abroad, as my routine was disrupted. Even then, I tried to backdate the log as best as I could.

In order to analyse the data in a meaningful way I had to spend a long time at the end recoding the entries. At the time of recording I didn’t attempt to standardise how I recorded a meal, so I had lots of very similar entries like “Eggs on toast”, “Fried egg”, and “Fried egg on toast”. I decided that the best way to recode the responses, rather than just combining values to something like “Egg toast”, was to add a variable number of tags to each entry. So all the above entries would get the tag of “bread”, “egg”, and “toast”. The tags describe the main ingredients in the dish, the cooking or preparation method, and sometimes the food culture I think the dish is from. Even with this system there are ambiguities that required me to make a decision. Should a tortilla be classified as “bread”, or is it it’s own thing like “flatbread”?

I used R to recode and process the data.

The original spreadsheet looked like this:

date	breakfast	lunch	supper	location
2022-08-20	Cereal	Falafel wrap	NOTHING	Edinburgh
2022-08-21	Toast and peanut butter	Sandwich (OUT)	Lentil salad	Edinburgh

The first this was to transform the table to a long format:

x %>%
  pivot_longer(
    names_to = "meal",
    cols = c("breakfast", "lunch", "supper"))

Then I added an ID value to each meal, recoded the date, and created two extra columns: out and nothing, which were logical based on whether the strings “OUT” and “NOTHING” were found in the meal entry. Then I set out adding tags to each meal, separated by a semi-colon.

x %>% 
    mutate(
      meal_id = row_number(),
      date = as.Date(date),
      out = ifelse(grepl("OUT", value), TRUE, FALSE),
      nothing = ifelse(grepl("NOTHING", value), TRUE, FALSE),
      value = trimws(na_if(gsub("\\(OUT\\)", "", value), "NA")),
      tag = case_when(
        value == "Aloo paratha" ~ "flatbread;potato",
        value == "Apple" ~ "fruit;apple",
        value == "Aubergeine curry" ~ "curry;aubergeine",

I also filtered out meals from incomplete months at the start and end of the survey period, so I could calculate unbiased monthly statistics. I separated out the tags, so a meal may span multiple lines, depending on how many tags it has. Finally, I added a logical column which adds a FALSE where tag “meat” is encountered.

x %>% 
	filter(date >= as.Date("2021-05-01"), date <= as.Date("2022-07-31")) %>% 
    separate_rows(tag, sep = ";") %>% 
    mutate(vegetarian = ifelse(tag == "meat", FALSE, TRUE))

The final table looked like this:

date	meal	value	meal_id	out	nothing	tag	vegetarian
2022-08-20	breakfast	Cereal	1	FALSE	FALSE	cereal	TRUE
2022-08-20	lunch	Falafel wrap	2	FALSE	FALSE	falafel	TRUE
2022-08-20	lunch	Falafel wrap	2	FALSE	FALSE	tortilla	TRUE
2022-08-20	supper	NOTHING	3	FALSE	TRUE	NA	TRUE

I made a plot of the number of meals I missed per month:

meals_clean %>% 
  group_by(meal_id, date) %>%
  summarise(nothing = any(nothing)) %>% 
  mutate(month = floor_date(date, "month")) %>% 
  group_by(month) %>% 
  summarise(sum_nothing = sum(nothing)) %>% 
  ggplot(., aes(x = month, y = sum_nothing)) + 
    geom_line() + 
    geom_point(shape = 21, fill = "darkgrey") + 
    scale_x_date(date_breaks = "1 month", date_labels = "%b %Y",
      guide = guide_axis(angle = 45)) + 
    labs(x = "Date", y = "Meals missed per month") +
    theme_bw() + 
    theme(panel.grid.minor = element_blank())

There seems to be a fairly regular oscillation where I miss meals fairly regularly on month, then rarely the next month. May 2022 is a big outlier, as I was travelling a lot and eating at irregular times. It’s possible that eating out has increased since I moved from London to Edinburgh in September 2022, as we live closer to the centre of the city and COVID restrictions have eased, but it’s difficult to tell from such noisy data.

Meals missed broken down by meal type:

meals_clean %>% 
  group_by(meal_id, date, meal) %>% 
  summarise(nothing = any(nothing)) %>% 
  group_by(meal) %>% 
  summarise(sum_nothing = sum(nothing)) %>% 
  ggplot(., aes(x = meal, y = sum_nothing)) + 
    geom_bar(stat = "identity", colour = "black", fill = "darkgrey") + 
    labs(x = "Meal", y = "Meals missed") +
    theme_bw() + 
    theme(panel.grid.minor = element_blank())

This doesn’t surprise me. Breakfast is an important meal for me, as it helps me to wake up, but sometimes I miss evening meals if I have had a big lunch or I’m doing something in the evening.

A monthly timeline of meals eaten out:

meals_clean %>%
  group_by(meal_id, date) %>% 
  summarise(out = any(out)) %>% 
  group_by(month = floor_date(date, "month")) %>% 
  group_by(month) %>% 
  summarise(sum_out = sum(out)) %>%
  ggplot(., aes(x = month, y = sum_out)) + 
    geom_line() + 
    geom_point(shape = 21, fill = "darkgrey") + 
    scale_x_date(date_breaks = "1 month", date_labels = "%b %Y",
      guide = guide_axis(angle = 45)) + 
    labs(x = "Date", y = "Meals out per month") +
    theme_bw() + 
    theme(panel.grid.minor = element_blank())

On average I eat 10.33 meals out per month, from a total of ~90 meals per month. That’s not bad. September 2021 is particularly low because I was living with my parents in the countryside, so few opportunities to eat out. April 2022 is also low because I was on fieldwork in Angola, where most of our meals we cooked ourselves or were cooked by someone we employed in the National Park.

and non-vegetarian meals:

meals_clean %>% 
  group_by(meal_id, date) %>% 
  summarise(vegetarian = any(vegetarian, na.rm = TRUE)) %>% 
  group_by(month = floor_date(date, "month")) %>% 
  group_by(month) %>% 
  summarise(sum_meat = sum(!vegetarian)) %>%
  ggplot(., aes(x = month, y = sum_meat)) + 
    geom_line() + 
    geom_point(shape = 21, fill = "darkgrey") + 
    scale_x_date(date_breaks = "1 month", date_labels = "%b %Y",
      guide = guide_axis(angle = 45)) + 
    labs(x = "Date", y = "Meaty meals per month") +
    theme_bw() + 
    theme(panel.grid.minor = element_blank())

May 2022 is particularly hight because I was in Mexico on holiday and decided to eat whatever I wanted, including lots of carne asada and cochinita pibil.

Finally, a breakdown of the most common tags by meal type:

meals_clean %>% 
  filter(!is.na(tag)) %>% 
  group_by(tag, meal) %>% 
  tally() %>% 
  group_by(meal) %>% 
  mutate(n_meals = sum(n)) %>% 
  group_by(meal, tag) %>% 
  mutate(prop = n / n_meals) %>% 
  group_by(meal) %>% 
  slice_max(prop, n = 10, with_ties = FALSE) %>%
  ggplot(., aes(x = reorder_within(tag, -prop, meal), y = prop)) + 
    geom_bar(stat = "identity", fill = "darkgrey", colour = "black") + 
    geom_label(aes(label = n)) + 
    scale_x_discrete(name = NULL, labels = function(x) sub('^(.*)___.*$', '\\1', x)) +
    facet_wrap(~meal, scales = "free_x", nrow = 3) + 
    labs(x = "Breakfast", y = "Proportion") +
    theme_bw()

Breakdown of most popular tags by meal type

This didn’t work out as nicely as I’d hoped. Bread comes out on top for all meal types, but that’s not particularly interesting. I eat a lot of sandwiches for lunch (which contain bread), and toast for breakfast. The tag system doesn’t really capture the essence of the meal. But it’s very hard to classify individual meals because they overlap so much and are so variable.

I had a stab at creating a network graph for the most commonly shared tags per meal, as I’ve been playing with {igraph} recently.

# Create list of dataframes of edges, with freq. per meal type
tag_edges <- meals_clean %>% 
  dplyr::select(meal, meal_id, tag) %>% 
  group_by(meal, meal_id) %>% 
  filter(n() > 1) %>% 
  do(data.frame(t(combn(.$tag, 2)))) %>% 
  ungroup() %>% 
  dplyr::select(-meal_id) %>% 
  group_by(meal, X1, X2) %>% 
  tally() %>% 
  rename(
    from = X1, 
    to = X2,
    weight = n) %>% 
  ungroup() %>% 
  split(., .$meal)

# For each meal type dataframe
tag_graph_list <- lapply(tag_edges, function(x) { 
  # Create a graph object
  tag_graph <- x %>% 
    dplyr::select(-meal) %>% 
    filter(weight > 5) %>% 
    graph.data.frame() %>% 
    as.undirected()

  # Create a plot
  ggraph(tag_graph, layout = 'linear', circular = TRUE) + 
    geom_edge_link(aes(width = weight)) + 
    geom_node_label(aes(label = name)) + 
    ggtitle(unique(x$meal)) + 
    theme_graph() + 
    theme(legend.position = "none")
})

# Use patchwork to mosaic plots
wrap_plots(tag_graph_list, ncol = 1)

Most common tag connections per meal type

Lots of eggs on toast for breakfast, or bananas with other stuff. Sandwiches with salad, cheese and salad for lunch. Curries, tomato pasta, and rice with beans and vegetables.

If I was to do this again I’d try harder to be more descriptive in the contents of the meals I ate. Rather than just “pasta”, “pasta with mushrooms, courgettes, tomato-based sauce, and crusty bread”.

Tracking meals for a year

2022-08-16