MS6218 Data Science in Marketing Homework 4
Prof. Gavin Feng Due at Mar. 1st 5pm
Our first scanner data analysis In this assignment we will go through the steps to conduct a basic demand analysis. For this purpose, we will use scanner data obtained obtained from Kraft Kraft (now Kraft Heinz). We only use a small small subset of all possible product/geography/variable combinations available in this sort of data (you will see more detailed data sets later). Our focus is on two levels of aggregation: Account level (Jewel-Osco), and market level (Kraft Central region = Midwest region). Products: • •
Kraft 32oz mayo Hellman’s 32 oz mayo
Time: Weekly data Variables: market, product, week, sales_units, sales_dollars Aggregation: The data for the Jewel-Osco account cover sales at all Jewel-Osco stores in Chicago. The data for the Kraft Central region cover sales in all stores in the Midwest region.
Inspecting the data Load the data set and inspect inspect the data frame mayo_DF mayo_DF using the summary summary function. function. The summary summary output output displays the two unique values in the variable market (“Jewel” and “Kraft Central”), and also the number of observation observationss correspondin corresponding g for each unique unique value. Similarly Similarly,, you see the unique product names in the product variable. If a variable contains a large number of unique values, then these values will not be shown in the summary summary output. Instead Instead you can use the function function unique: unique(v unique(variab ariable/col le/column umn name), or the table function that we already encountered in the Introduction to R tutorial. Let’s summarize the data separately for both products in the Kraft Central region and at the Jewel-Osco account. In order to do this we’ll learn how to subset data frames in R using the subset function. mayo_DF = read.csv("Mayo.csv" "Mayo.csv") ) mayo_Hellmans_Jewel mayo_Hellmans_Jewel = mayo_DF[mayo_DF mayo_DF[mayo_DF$product=="Hel "Hellman lmans s Mayo 32oz 32oz" " "Jewel",] ,] & mayo_DF$ market=="Jewel"
The first first line line in the script above above extrac extracts ts all rows in mayo mayo_DF _DF whenever whenever the two two condit condition ionss prodproduct==“Hellmans Mayo 32oz” and market==“Jewel” are TRUE. Note that & is the logical AND operator (for other purposes you may use |, the logical OR). All of the rows satisfying these conditions are then assigned to the new data frame mayo_Hellmans_Jewel. You can now summarize the newly created data frame. Now calculate calculate the summary summary statistics for all four product/mark product/market et combinat combinations ions.. You’ll see that we don’t have the same number of observations in the two markets (the first 16 weeks at Jewel-Osco are missing), and that total unit and dollar sales are higher in the Kraft Central region compared to Jewel (as expected).
1
Create a price variable Construct an average price variable by dividing dollar sales by unit sales: mayo_DF$price = mayo_DF$sales_dollars/ mayo_DF$sales_units
1. Explain how to interpret this price variable — how does it differ from a product price at the store level? 2. Provide summary statistics (mean, median, and standard deviation) for the product prices, separately for each product/market combination.Report the statistics in a table. Are the means of prices similar across the Kraft Central region and Jewel-Osco? Is there more price variation at Jewel or in the Central Region? Why? What is the implication for our ability to estimate price elasticities with either account level data or data in a large geographic market?
Time-series plots of prices A time-series plot is a graph with a variable indicating the progress of time on the x-axis and the variable of interest on the y-axis. You can easily create time-series plots in R using the plot function: plot(mayo_DF$week, mayo_DF$price, type = "o")
e c i r p $ F D _ o y a m
0 . 1 6 . 0
0
20
40
60
80
100
mayo_DF$week Note the type = “o” option, which stands for “overplotted points and lines” and tells R to connect the displayed data points with lines. The default for the type option is “p”, and then only data points are plotted, while “l” produces lines without data points. The graph created above is messy. Why? Because you plotted the price data for both products in both markets on the same graph. Instead, we will create time-series plots separately for different product/market combinations. mayo_Hellmans_Jewel = subset(mayo_DF, product=="Hellmans Mayo 32oz" & market=="Jewel") plot(mayo_Hellmans_Jewel$week, mayo_Hellmans_Jewel$price, type = "o", pch = 21, lwd = 0.4, bg = "limegreen", main = "Prices of Hellman s Mayo at Jewel-Osco", xlab = "Week", ylab = "Price")
2
r ces o
e c r
e man s
ayo at ewe − sco
0 1 . 1 5 9 . 0
20
40
60
80
100
Week
You can now repeat this process for all product/market combinations. 3. Provide time-series plots for all product/market combinations using your favorite method. There are some visible differences between the prices at Jewel and the prices in the Kraft Central region. Why?
Plotting a demand relationship Construct scatter-plots of unit sales versus prices for all product/market combinations. You can do this using either of the methods discussed above, method 2 of course being the easier. Note: If you create a scatter plot don’t use the option type = “o” in plot — see the explanation for this option in Section “Time-series plots of prices.” For a time-series graph, on the other hand, the type = “o” option is useful to connect the data points. 4. Provide scatter-plots of the demand relationship (unit sales versus prices) for all product/market combinations. Is there evidence for a negatively sloped demand-curve in the data?
Demand estimation: Own-price elasticities Fit the log-linear demand model to the data. To estimate the regression model you need to create the logs of prices and unit sales. There are different ways of doing this. First, you can add new variables/columns to the data frame mayo_DF for the log of price and unit sales: mayo_DF$log_price = log(mayo_DF $price) mayo_DF$log_sales_units = log(mayo_DF$sales_units)
5. Estimate the log-linear demand model and provide the regression results separately for all four product/market combinations. Discuss the results. Is demand more elastic at Jewel-Osco or in the Kraft Central region? What is the reason for the observed difference in the elasticities? 6. Using the regression results for the log-linear demand model, calculate the percentage change in unit sales for a simultaneous 10 percent increase in the price of Kraft and Hellman’s mayo at Jewel-Osco. Use the exact formula.
Cross-price elasticities Now we allow for both own and cross-price effects in the log-linear demand model. We first need to reshape the data such that we have columns with unit sales and price information for both products.
3
The first step is to extract only the data that we need to create the final data frame used for estimation: mayo_DF_extract = mayo_DF[, c("market", "product", "week", "sales_units", "price")] head(mayo_DF_extract,3)
1 2 3
market product week sales_units price Jewel Hellmans Mayo 32oz 17 23247 0.9909236 Jewel Hellmans Mayo 32oz 18 24131 1.0118934 Jewel Hellmans Mayo 32oz 19 24195 0.9959496
In general, for any data frame DF one can extract rows and columns using the syntax: DF[rows to extract, columns to extract]. In the extraction statement above we did not specify any rows before the comma, which R interprets as “use all rows in the data frame”. To extract columns we combined several column names using the combine function c().
Now we reshape the data and create a new data frame with product-level sales unit and price information in columns: mayo_DF_wide = reshape(mayo_DF_extract, timevar = "product", idvar = c("market", "week"), direction = "wide") head(mayo_DF_wide,3) market week sales_units.Hellmans Mayo 32oz price.Hellmans Jewel 17 23247 Jewel 18 24131 Jewel 19 24195 sales_units.Kraft Mayo 32oz price.Kraft Mayo 32oz 1 5560 1.0769784 2 5342 1.0885436 3 13864 0.9948067 1 2 3
Mayo 32oz 0.9909236 1.0118934 0.9959496
Fortunately we can quite easily change the annoyingly long variable names: colnames(mayo_DF_wide) = c("market", "week", "sales_H", "price_H", "sales_K", "price_K") head(mayo_DF_wide,3)
market week sales_H price_H sales_K price_K 1 Jewel 17 23247 0.9909236 5560 1.0769784 2 Jewel 18 24131 1.0118934 5342 1.0885436 3 Jewel 19 24195 0.9959496 13864 0.9948067
Now you have the data ready to estimate the own and cross price elasticities, separately for Hellman’s and Kraft mayo at Jewel-Osco and in the Kraft Central region.
7. Obtain estimates of the own and cross-price effects using the log-linear demand model for all product/market combinations. 8. One purpose of demand estimation is to understand whether a brand is “vulnerable” to a competitor’s pricing policies. That is, to what extent is the demand for a product affected by the price of a competing brand? Based on your demand estimates, which brand is more vulnerable? 9. The price of Hellman’s mayo is cut by 10 percent at Jewel-Osco. Use your estimates to calculate by what percent the price of Kraft mayo has to be changed at Jewel to obtain the level of sales before the Hellman’s price cut.
4