Bellabeat

CASE PROFILE

Bellabeat is a successful high-tech manufacturer focusing on health-related products for women, and they have the potential to become a larger player in the global smart device market, hence my role here is to analyse their smart device data to gain insight into how consumers are using their smart devices and provide recommendations for Bellabeat´s marketing strategy.

ASK

Business task

  1. What are the trends in smart device usage?
  2. How could these trends apply to Bellabeat customers?
  3. How could these trends help influence Bellabeat marketing strategy?

Stakeholders

Urška Sršen – Bellabeat´s cofounder and Chief Creative Officer
Sando Mur – Mathematician and Bellabeat’s cofounder
Bellabeat’s marketing analytics team – A team of data analytics

PREPARE

Data Summary

The data source used for this case study is FitBit Fitness Tracker Data. This dataset was downloaded from Kaggle where it was uploaded by Möbius. The time range of data is 12-04-2016 to 12-05-2016. It contains 33 users´ personal tracker data.

Data limitation

All data was collected in 2016 which is outdated for analysing current trends in market. With only 33 participants and 31-day period will possibly make the sample itself create a sample bias, a larger sample size will be more representative of the actual user’s population and therefore increase the confidence interval of analysis results. Lastly, the lack of demographic information will also limit recommendations on the target audience (including gender, location, age and job status).

PROCESS

Selected datasets

  • Daily_Activity
  • Hourly_Steps
  • Hourly_Intensities
  • Hourly_Calories
  • Weight_Log_Info
  • sleepDay_merged

Use Google sheet to clean data

Each dataset was cleaned using google sheet. The following steps were taken within each dataset:

  • Sorted and filtered data by Id to obtain how many unique users there were within the dataset.
  • Checked for duplicate data using the ’remove the duplicates’ tool in google sheet (only 2 rows of duplicate were found in the sleepDay_merge and those were removed).
  • Formatted date data into DD/MM/YY date format.
  • Formatted all numerical data into Number format with either no decimals or up to 2 decimals.
  • Checked Id entries and other columns for LEN to make sure the data was correct and uniform in length.
  • Separate Date and Hour into two columns and create the column for weekday when needed for later analysis. Utilized the ’Text to Columns’ tool to do so.
  • Formatted any time data into 24 hour time format for consistency.

ANALYSIS & SHARE

After cleaning the data, I chose to use SQL to continue the analysis phase of data and meanwhile using Power BI to visualize the results. Update the cleaned Bellabeat data into Bigquery.
Check the participants by counting number of distinct Id in each dataset.

  • Daily_Activity = 33
  • Hourly_Steps = 24
  • Hourly_Calories = 33
  • Hourly_Intensities = 33
  • Hourly_Steps = 33
  • Weight_Log_Info = 8

As the weight dataset only contains 8 data information, hence will not be used in continuing analysis.


Examine the relationship between calories and steps

To answer this question, I used SQL to do easy calculation and get the total steps and total calories for each user and then compare them to see if there is correlation in between.

Result shows that total steps and calories have a positive correlation, which is obvious – the higher calories user consume, the higher total steps that users walk or run.

Examine which weekday is users most active

To find out the most active weekday for users, i used google sheet to change date to weekday, using SQL to determine which weekday have the highest average steps and distance.


The answer is Saturday and Tuesday. To show a more straightforward result, i visualise the result in Power Bi. It also indicates that on Saturday and Tuesday, users consume the highest calories.

Examine what time are the users the most active

I used hourlyIntensities data to answer this question. First, to easily input in SQL, I opened this file in google sheet and split ActivityHour data that combines both Day and Time to get only ActivityHour which was not including the date, month and year. Then I exported the results to Power BI and got the visualization as below:

It can be seen that users are most active from 17-19h (evening), which indicates that users will often exercise during this period of time.

Examine the most common active level for users

Here i used SQL and easily got the sum of different active minutes and calories and visualize the result in Power BI to better see the relationship in between.

All users have lightly_active level and it does not have a positive relationship with calories.

INSIGHTS & MARKETING RECOMMENDATIONS

More users prefer to use the daily activity feature to track their health habits than other features like sleep or weight. The second most popular trait was sleep, followed by weight. This information can be used to showcase Bellabeat products and do market promotion based on daily activity to attract more users.

Calories and steps have a positive linear relationship. When users run or walk more frequently, they also burn more calories.

Saturday was the most active day for users to walk or run, followed by Tuesday. These 2 days are also the days when users consume the highest calories in a week, so Bellabeat can market more intense exercises on these two days.

The data also shows that the highest total steps are tracked in the lunch time frame and 5pm to 7pm. This suggests that most users have a fixed routine – usually at lunch or after getting off work with the most activities. I recommend Bellabeat to market their product to clients who are most likely to live around this particular type of routine – clients with regular work schedules and parents with regular routines.

We also know that all our runners are lightly active runners, with this information we can now create the campaign to target our main audience, leisure runners.

Lastly, to get more insights and trends, it would be better if Bellabeats can provide a larger and updated sample size to increase the confidence interval. Also, more demographic information would be nice to include as well to determine the trends about such as gender, location or age.

SQL Notes