This report presents an analysis of Cyclist ride data spanning Q1–Q4 2024, prepared for the Marketing Department to support future business decisions. The goal is to uncover user behavior patterns, highlight seasonal trends, and estimate ride-level revenue based on membership type and bike preference. The analysis is conducted as part of the Google Data Analytics Professional Certificate capstone, using a hypothetical company (“Cyclist”) and real-world ride data from Divvy Bikes—a bike-sharing service based in Chicago.
Data source: The dataset is publicly available under the Divvy Data License Agreement. This analysis is for educational purposes only and is not affiliated with or endorsed by Divvy, Lyft, or the City of Chicago.
Twelve monthly datasets from January to December 2024 were extracted
using SQL in Google BigQuery. Null values were filtered from key columns
start_station_name and end_station_id to
ensure data integrity. No duplicate entries were detected, and all types
were consistent across tables.
Each cleaned monthly dataset was exported as CSV and merged into a single dataset for analysis in R. The final combined file contains distinct 4,208,309 observations.
# SQL Sample of dataset cleaning
SELECT *
FROM `capstone-project-202507.Cyclist.2024-divvy-tripdata`
WHERE start_station_name IS NOT NULL
AND end_station_id IS NOT NULL
The dataset contains the following columns:
ride_id: Unique identifier for each riderideable_type: Type of bike used for the ridestarted_at: Start time of the rideended_at: End time of the ridestart_station_name: Name of the station where the ride
startedend_station_name: Name of the station where the ride
endedstart_station_id: Unique identifier for the start
stationend_station_id: Unique identifier for the end
stationstart_lat: Latitude of the start stationstart_lng: Longitude of the start stationend_lat: Latitude of the end stationend_lng: Longitude of the end stationmember_casual: Type of membership (member or
casual)We found the dataset for started_at and ended_at character columns, and convert them to POSIXct date-time format for easier analysis.
# Change the started_at and ended_at columns to Date Time format
combined_clean_data$started_at <- as.POSIXct(combined_clean_data$started_at, format="%Y-%m-%d %H:%M:%S")
combined_clean_data$ended_at <- as.POSIXct(combined_clean_data$ended_at, format="%Y-%m-%d %H:%M:%S")
We began by comparing total ride counts between annual members and casual users. A simple bar plot highlights strong seasonal engagement from casual users and consistent usage from members.
# Find overall composisiton of type of member, annual & casual
member_composition <- combined_clean_data %>%
group_by(member_casual) %>%
summarise(total_rides = n())
# Graph for total rides by membership type
p2a <- ggplot(member_composition,
aes(x = member_casual,
y = total_rides,
fill = member_casual)) +
geom_col() +
scale_fill_manual(name = "Membership Type",
values = c(member = "#1b9e77",
casual = "#d95f02")) +
geom_text(aes(label = label_comma()(total_rides)),
colour = "white", vjust = 3, size = 7) +
labs(
title = "Membership Types in 2024",
subtitle = "Visualizing member type distribution",
x = NULL,
y = "Total Rides") +
scale_x_discrete(labels = c(member = "Annual Member",
casual = "Casual Rider")) +
scale_y_continuous(
breaks = seq(0, max(member_composition$total_rides), by = 500000),
labels = label_number(scale = 1e-6, suffix = "M", accuracy = 0.1)) +
theme_minimal() +
theme(legend.position = "none")
print(p2a)
To explore user preferences, we created a summary of bike types (classic_bike, electric_bike, and electric_scooter) grouped by membership. The resulting plot reveals that casual riders show a stronger preference for classic bikes—especially during weekends.
# Find overall distribution ride type based on member type
ride_type_summary <- combined_clean_data %>%
group_by(member_casual, rideable_type) %>%
summarise(total_ride_type = n(), .groups = "drop")
# Bar chart comparison bike type use by member type
p2b <- ggplot(ride_type_summary,
aes(x = rideable_type,
y = total_ride_type,
fill = member_casual)) +
geom_col(position = position_dodge()) +
scale_fill_manual(name = "Membership Type",
values = c(member = "#1b9e77",casual = "#d95f02")) +
geom_text(aes(label = label_comma()(total_ride_type)),
position = position_dodge2(width = 0.9), vjust = -0.2, fontface = "bold") +
labs(title = "Bike Type Usage Distribution by Member Type",
subtitle = "Visualizing bike type used by users",
x= NULL, y = "Total Rides") +
scale_x_discrete(labels = c(classic_bike = "Classic Bike",
electric_bike = "Electric Bike",
electric_scooter = "Electric Scooter")) +
scale_y_continuous(breaks = seq(0, max(ride_type_summary$total_ride_type), by = 500000),
labels = label_number(scale = 1e-6, suffix = "M", accuracy = 0.1)) +
theme_minimal() +
theme(legend.position = "none")
print(p2b)
We analyzed ride trends across each month of 2024, segmented by membership type. Seasonal variation was evident: casual users showed strong engagement during summer months, while annual members maintained consistent ride activity year-round. A faceted bar chart displays ride volume per month, with seasonal bands for context.
#Summarize monthly totals
monthly_summary <- combined_clean_data %>%
mutate(month_num = month(started_at),
month = month(started_at, label = TRUE, abbr = TRUE)) %>%
group_by(month_num, month, member_casual) %>%
summarise(total_rides = n(), .groups = "drop")
#Define seasons and their numeric ranges
seasons <- tibble( season = c("Winter", "Spring", "Summer", "Fall"),
start_mon = c(12, 3, 6, 9),
end_mon = c(2, 5, 8, 11),
x_label = c(1.5, 4, 7, 10)) # midpoint for label
#Compute peak y so we can place season labels above data
y_max <- max(monthly_summary$total_rides)
#Plot: season bands, bars, lines/points, labels
p2c <- ggplot() +
# Season bands (semi-transparent rectangles)
geom_rect(data = seasons, inherit.aes = FALSE,
aes(xmin = start_mon - 0.5, xmax = end_mon + 0.5,
ymin = -Inf, ymax = Inf),
fill = c("#D3E4CD", "#FFF3B0", "#F6C6EA", "#C4DEF6"),
alpha = 0.2) +
# Bars (dodge by membership)
geom_col(data = monthly_summary,
aes(x = month_num,
y = total_rides,
fill = member_casual),
position = position_dodge(width = 0.8), width = 0.7, alpha = 0.6) +
# Trend lines and points
geom_line(data = monthly_summary,
aes(x = month_num,
y = total_rides,
color = member_casual,
group = member_casual),
position = position_dodge(width = 0.8),
linetype = "dashed") +
geom_point(data = monthly_summary,
aes(x = month_num,
y = total_rides,
color = member_casual),
position = position_dodge(width = 0.8), size = 3) +
# Season labels
geom_text(data = seasons,inherit.aes = FALSE,
aes(x = x_label, y = y_max * 1.05, label = season),
size = 4, fontface = "bold") +
# Scales, labs, theme
scale_x_continuous(breaks = 1:12,
labels = month.abb) +
scale_y_continuous(labels = label_number(scale = 1e-3, suffix = "K"),
expand = expansion(mult = c(0, 0.05))) +
scale_fill_manual(name = "Membership",
values = c(member = "#1b9e77", casual = "#d95f02")) +
scale_color_manual(name = "Membership",
values = c(member = "#1b9e77", casual = "#d95f02")) +
labs(title = "Seasonal Rides Trend by Membership Type",
subtitle = "Visualizing total rides and trend line with seasonal band",
x = "Month",
y = "Total Rides (Thousands)") +
theme_minimal() +
theme(legend.position = "bottom")
print(p2c)
We explored how much time users spend on rides by grouping ride lengths into behavioral buckets (e.g., 0–5 min, 5–15 min, etc.). After we create bar plot we found out most of user fall-in under 30 minutes internal use.
# Create table with weekday, ride_length and day_type variables
ride_duration <- combined_clean_data %>%
mutate(weekday = wday(started_at, label = TRUE),
ride_length = as.numeric(difftime(ended_at, started_at, units = "mins")),
day_type = if_else(wday(started_at) %in% c(1, 7), "Weekend", "Weekday")
)
user_rides_interval <- ride_duration %>%
mutate(duration_interval = cut(ride_length,
breaks = c(0, 5, 15, 30, 45, 60, 90, 120, Inf),
labels = c("0-5min", "5-15min", "15-30min", "30-45min", "45-60min",
"60-90min", "90-120min", "Over 120min")),
rideable_type = recode(rideable_type, classic_bike = "Classic Bike",
electric_bike = "Electric Bike",
electric_scooter = "Electric Scooter")) %>%
filter(!is.na(duration_interval)) %>%
count(member_casual, weekday, rideable_type, duration_interval, name = "total_user")
# Bar chart user rides interval
p2d <- ggplot(user_rides_interval,
aes(x = total_user, y = duration_interval, fill = rideable_type )) +
geom_col() +
scale_x_continuous(labels = label_number(scale = 1e-3, suffix = " K"),
breaks = seq(0,1500000, by = 300000 ),
minor_breaks = seq(0, 1500000, by = 100000))+
labs(title = "Ride Duration Buckets",
subtitle = "Visualizing ride duration interval bins with member types facet",
x = "Total User (thousand)",
y = "Interval Duration",
fill = "Rideable Type") +
theme_light()+
theme(panel.grid = element_line(color = "grey60"),
panel.grid.minor = element_line(color = "grey90"))+
facet_grid(~member_casual,
labeller = labeller(member_casual = c(casual = "Casual Rider",
member = "Annual Member")))
print(p2d)
Using reference rates and unlock fees from [Divvy Pricing] (https://divvybikes.com/pricing), we summarize duration category to real life pricing plan.
Revenue calculations in this segment exclude capped pricing rules for member e-bike rides between 31–45 minutes and do not account for day-pass packages due to data limitations.
Annual Member revenue only count for their extra time using after 45 minutes for classic bike and after 30 minutes with e-bike.
# Dataframe for assign_price_tier by creating 3 variable
# free_threshold for each plan tier classic 0-30min, e-bike
# per_min_rate count extra charge by rate
# unlock_fee free for member and other will $1
assign_price_tier <- ride_duration %>%
mutate(
free_threshold = case_when(
rideable_type == "classic_bike" & member_casual == "member" ~ 45, #Not charge until 45min
rideable_type == "classic_bike" & member_casual == "casual" ~ 0,
rideable_type == "electric_bike" ~ 0,
rideable_type == "electric_scooter" ~ 0),
per_min_rate = case_when(
rideable_type == "classic_bike" ~ 0.19,
rideable_type == "electric_bike" & member_casual == "member" ~ 0.19,
rideable_type == "electric_bike" & member_casual == "casual" ~ 0.44,
rideable_type == "electric_scooter" & member_casual == "member" ~0.31,
rideable_type == "electric_scooter" & member_casual == "casual" ~0.44),
unlock_fee = case_when(
member_casual == "member" ~ 0,
TRUE ~ 1), #Others tier will be charge $1
extra_minutes = pmax(ride_length - free_threshold, 0),
ride_revenue = unlock_fee + extra_minutes * per_min_rate
)
# Dataframe for revenue summary
revenue_summary <- assign_price_tier %>%
group_by(member_casual, rideable_type, day_type) %>%
summarise(total_rides = n(),
breach_rate = mean(extra_minutes > 0), #percentile
avg_revenue = mean(ride_revenue), #in USD
total_revenue = sum(ride_revenue), #in USD
.groups = "drop")
knitr::kable(revenue_summary,
caption = "Revenue Summary")
| member_casual | rideable_type | day_type | total_rides | breach_rate | avg_revenue | total_revenue |
|---|---|---|---|---|---|---|
| casual | classic_bike | Weekday | 571473 | 0.9999755 | 6.1403776 | 3509060.00 |
| casual | classic_bike | Weekend | 397415 | 0.9999472 | 7.0618412 | 2806481.62 |
| casual | electric_bike | Weekday | 347443 | 0.9999223 | 7.2298515 | 2511961.28 |
| casual | electric_bike | Weekend | 179550 | 0.9998775 | 8.9581976 | 1608444.38 |
| casual | electric_scooter | Weekday | 18663 | 1.0000000 | 5.6097079 | 104693.98 |
| casual | electric_scooter | Weekend | 7077 | 1.0000000 | 6.9268141 | 49021.06 |
| member | classic_bike | Weekday | 1333956 | 0.0156647 | 0.1627604 | 217115.19 |
| member | classic_bike | Weekend | 425303 | 0.0296518 | 0.2213451 | 94138.76 |
| member | electric_bike | Weekday | 706850 | 0.9999165 | 1.9874065 | 1404798.30 |
| member | electric_bike | Weekend | 198492 | 0.9998640 | 2.2765360 | 451874.18 |
| member | electric_scooter | Weekday | 17767 | 1.0000000 | 2.4168156 | 42939.56 |
| member | electric_scooter | Weekend | 4320 | 1.0000000 | 2.5651938 | 11081.64 |
breach_rateshowing the rate of member type to breach the free time threshold in percentile.
We summarized total revenue per month across bike types and membership status. Revenue spikes in July and August correlated with increased casual usage of classic bikes, suggesting summer promotions could be optimized around this times.
# Summarize monthly revenue
revenue_monthly <- assign_price_tier %>%
mutate(month_num = month(started_at),
month = month(started_at, label = TRUE, abbr = TRUE)) %>%
group_by(month_num, month, member_casual, rideable_type) %>%
summarise(revenue_by_month = sum(ride_revenue, na.rm = TRUE),
.groups = "drop")
# Compute peak y so we can place season labels above data
y_max_revenue <- max(revenue_monthly$revenue_by_month)
#Plot: season bands → bars → labels
p3a1 <- ggplot() +
# Season bands
geom_rect( data = seasons, inherit.aes = FALSE,
aes(xmin = start_mon - 0.5,
xmax = end_mon + 0.5,
ymin = -Inf,
ymax = Inf),
fill = c("#D3E4CD", "#FFF3B0", "#F6C6EA", "#C4DEF6"),
alpha = 0.2) +
geom_col(data = revenue_monthly,
aes(x = month_num,
y = revenue_by_month,
fill = member_casual,
group = member_casual),
width = 0.7, alpha = 0.6) +
#Season labels
geom_text(data = seasons, inherit.aes = FALSE,
aes(x = x_label, y = y_max_revenue * 2, label = season),
size = 4,fontface = "bold") +
#Scales, labs, theme
scale_x_continuous(breaks = 1:12,
labels = month.abb, "Month") +
scale_y_continuous(labels = label_number(scale = 1e-6, suffix = "M"),
expand = expansion(mult = c(0, 0.05))) +
scale_fill_manual(name = "Membership",
values = c(member = "#1b9e77", casual = "#d95f02")) +
labs(title = "Seasonal Revenue Trends",
subtitle = "Visualizing revenue trends based on user type",
y = "Total Revenue (million USD)") +
theme_minimal() +
theme(legend.position = "bottom")
print(p3a1)
# Faceting by rideable_type
ridetype_facetlabel <- c(classic_bike = "Classic Bike",
electric_bike = "Electric Bike",
electric_scooter = "Electric Scooter")
p3a2 <- ggplot(revenue_monthly,
aes(x = month_num,
y = revenue_by_month,
fill = member_casual)) +
geom_area(alpha = 0.6) +
facet_grid( ~ rideable_type,
labeller = labeller(rideable_type = ridetype_facetlabel)) +
labs(title = "Seasonal Revenue Trends by Ride Types",
subtitle = "Visualizing revenue trend by member types with ride types facet",
x = "Month",
y = "Total Revenue (thousand USD)") +
scale_x_continuous(breaks = 1:12,
labels = month.abb,
guide = guide_axis(angle = 90),
minor_breaks = NULL) +
scale_y_continuous(labels = label_number(scale = 1e-3, suffix = "K"),
expand = expansion(mult = c(0, 0.05)),
breaks = c(0, 2e+5, 4e+5, 6e+5, 8e+5, 10e+5)) +
scale_fill_manual(name = "Membership",
values = c(member = "#1b9e77", casual = "#d95f02")) +
theme_light() +
theme(panel.grid = element_line(color = "grey80"),
panel.grid.minor = element_line(color = "grey95")) +
theme(legend.position = "bottom")
print(p3a2)
By comparing average revenue on weekdays and weekends, we observed that casual users generate more revenue on weekends, particularly on e-bikes. While members primarily use classic bikes within their free time window.
# Plot average revenue based on day type
p3b<- ggplot(revenue_summary,
aes(x = day_type,
y = avg_revenue,
fill = member_casual)) +
geom_col(position = position_dodge2(0.8)) +
geom_text(aes(label = label_number(accuracy = 0.1)(avg_revenue)),
position = position_dodge2(0.9), vjust = 1.5,
fontface = "bold", color = "white") +
facet_grid( ~ rideable_type,
labeller = labeller(rideable_type = ridetype_facetlabel)) +
labs(title = "Weekday vs Weekend Revenue",
subtitle = "Visualizing average revenue by member types with ride types facet",
x = "Type of Days",
y = "Average Revenue (USD)") +
scale_y_continuous(breaks = c(2, 4, 6, 8)) +
scale_fill_manual(name = "Membership",
values = c(member = "#1b9e77", casual = "#d95f02")) +
theme_light() +
theme(panel.grid = element_line(color = "grey80"),
panel.grid.minor = element_line(color = "grey95")) +
theme(legend.position = "bottom")
print(p3b)
Outliers retained for descriptive analysis but excluded from duration-based summaries.
Before conducting average-based analyses, we examined ride duration to detect and remove extreme values that fall outside typical user behavior.
ride_duration <- combined_clean_data %>%
mutate(weekday = wday(started_at, label = TRUE),
ride_length = as.numeric(difftime(ended_at, started_at,units = "mins")),
day_type = if_else(wday(started_at) %in% c(1, 7), "Weekend", "Weekday")
)
When we explore for duration of rides, we observe two extreme in ride_length that outside typical user behavior: rides under 1 minute and rides over 120 minute This threshold based on business and user behavior logic: - Is unlikely user will use service under 1 minuter and over 2 hours biking continuously - It flag potential error, special case or apps bug/test
We created a summary of flagged outliers grouped by
member_casual. This helped reveal whether casual or member
users are more prone to unusual trip lengths.
# Outlier filter dataframe for ride_length under 1 minutes and over 120 minutes
ride_outlier <- ride_duration %>%
mutate(ride_length = as.numeric(difftime(ended_at, started_at, units = "mins")),
duration_outlier = case_when(
short_rides = ride_length < 1 ~ "Under 1 min",
long_rides = ride_length > 120 ~ "Over 120 min",
TRUE ~ "Valid Ride")) %>%
count(member_casual, duration_outlier, name = "total_outlier") %>%
filter(duration_outlier != "Valid Ride")
knitr::kable(ride_outlier,
caption = "Outliers Summary")
| member_casual | duration_outlier | total_outlier |
|---|---|---|
| casual | Over 120 min | 26562 |
| casual | Under 1 min | 15260 |
| member | Over 120 min | 4248 |
| member | Under 1 min | 24301 |
ride_length under 1 minutesride_length over than 120 minutesOutliers were removed prior to conducting duration-based analysis.
The filtered dataset ride_duration_clean includes only
rides between 1 and 120 minutes.
# Clean outlier in ride_length under 1 min and over 120min
ride_duration_clean <- ride_duration %>%
filter(ride_length >= 1, ride_length <= 120)
We found that casual riders consistently had longer ride duration than members, especially on weekends. This suggests a leisure-oriented use case, while members tend to commute or use the service more efficiently.
# Average duration of rides by membership type based on weekday and weekend
mean_duration_summary <- ride_duration_clean %>%
mutate(day_of_week = wday(started_at, label = TRUE, abbr = TRUE),
day_type = ifelse(day_of_week %in% c("Sat", "Sun"),"Weekend", "Weekday")
) %>%
group_by(member_casual, day_type) %>%
summarise(mean_duration = as.numeric(mean(difftime(
ended_at, started_at, units = "mins"))),
.groups = "drop")
p5a <- ggplot(mean_duration_summary,
aes(x = day_type, y = mean_duration, fill = member_casual)) +
geom_col(position = position_dodge2(width = 0.6)) +
scale_fill_manual(name = "Membership Type",
values = c(member = "#1b9e77",casual = "#d95f02"))+
scale_y_continuous(labels = label_number(suffix = " min"),
expand = expansion(mult = c(0, 0.1))) +
geom_text(aes(label = label_number(accuracy = 0.1)(mean_duration)),
position = position_dodge2(width = 0.9),
color ="white", vjust = 3, size = 5, fontface = "bold") +
labs(title = "Average Ride Duration Weekday vs Weekend",
subtitle = "Visualizing compariosn average ride duration by member types",
x = "Day Type",
y = "Average Rides (minutes)") +
theme_minimal() +
theme(legend.position = "bottom")
print(p5a)
We further broke down average ride duration by weekday. Casual riders peaked on Saturdays and Sundays, while member usage remained steady, showing their behavior is less time-dependent.
# Average duration of rides by membership type based days of the week
mean_duration_eachday <- ride_duration_clean %>%
mutate(weekday = wday(started_at, label = TRUE),
ride_length = as.numeric(difftime(ended_at, started_at, units = "mins"))
) %>%
group_by(member_casual, weekday) %>%
summarise(mean_duration = as.numeric(mean(ride_length, na.rm = TRUE)),
.groups = "drop")
p5b <- ggplot(mean_duration_eachday,
aes(x = weekday, y = mean_duration, fill = member_casual)) +
geom_col(position = position_dodge2(width = 0.6)) +
scale_fill_manual(name = "Membership Type",
values = c(member = "#1b9e77",casual = "#d95f02"))+
scale_y_continuous(labels = label_number(suffix = " min"),
expand = expansion(mult = c(0, 0.1))) +
geom_text(aes(label = label_number(accuracy = 0.1)(mean_duration)),
position = position_dodge2(width = 0.9),
color ="white", vjust = 3, size = 3, fontface = "bold") +
labs(title = "Average Ride Duration by Day of Week",
subtitle = "Visualizing comparison average ride duration by day of week",
x = "Days of week",
y = "Average Rides (minutes)") +
theme_minimal() +
theme(legend.position = "bottom")
print(p5b)
A faceted chart displays average ride duration segmented by
rideable_type. Classic bikes showed the longest average
duration for casual users, while electric bikes were favored for
shorter, time-efficient rides.
# Average ride duration based on bike types
biketype_duration_summary <- ride_duration_clean %>%
mutate(day_of_week = wday(started_at, label = TRUE, abbr = TRUE),
day_type = ifelse(day_of_week %in% c("Sat", "Sun"), "Weekend", "Weekday")) %>%
group_by(member_casual, rideable_type, day_type) %>%
summarise(mean_duration = as.numeric(mean(difftime(
ended_at, started_at, units = "mins"))), .groups = "drop")
#Create custom label for facet warp rideable_type
biketype_labels <- c(classic_bike = "Classic Bike",
electric_bike = "Electric Bike",
electric_scooter = "Electric Scooter")
# Bar chart for average duration based on bike types
p5c <- ggplot(biketype_duration_summary,
aes(x = day_type, y = mean_duration, fill = member_casual)) +
geom_col(position = position_dodge2(width = 0.6)) +
facet_wrap(~rideable_type,
labeller = labeller(rideable_type = biketype_labels)) +
scale_fill_manual(name = "Membership Type",
values = c(member = "#1b9e77",casual = "#d95f02"))+
geom_text(aes(label = label_number(accuracy = 0.1)(mean_duration)),
position = position_dodge2(width = 0.9),
color ="white", vjust = 1.5, size = 3, fontface = "bold") +
labs(title = "Average Ride Duration by Ride Type",
subtitle = "Visualizing average ride duration by member type with bike type facet",
x = "Day Type",
y = "Average Rides (minutes)") +
theme_light() +
theme(legend.position = "bottom")
print(p5c)
A stacked plot with member type show the pattern of our users:
#peak hour summary & user tendencies
ride_hour_summary <- ride_duration_clean %>%
mutate(hour = hour(started_at))
peak_by_user <- ride_hour_summary %>%
group_by(member_casual, hour) %>%
summarise(total_rides = n(), .groups = "drop")
#Plot Peak ride hour
p5d <- ggplot(peak_by_user, aes(x = hour, y = total_rides, fill = member_casual)) +
geom_area(alpha = 0.6) +
scale_fill_manual(name = "Membership Type",
values = c(member = "#1b9e77",casual = "#d95f02")) +
scale_x_continuous(breaks = seq(0, 23, by = 1),
minor_breaks = NULL) +
scale_y_continuous(labels = label_number(scale = 1e-3, suffix = "K"),
expand = expansion(mult = c(0, 0.05))) +
labs(title = "Peak Ride Hours by Membership Type",
subtitle = "Visualizing peak hour member type through out the year",
x = "Hour of Day", y = "Ride Count (thousand)") +
theme_minimal()
print(p5d)
We identified the top 10 starting and ending stations by ride volume. These high-traffic locations, often downtown or near tourist zones, indicate key operational touch-points for fleet allocation and promotional targeting
top_started_station <- ride_duration_clean %>%
group_by(start_station_name) %>%
drop_na(start_station_name) %>%
summarise(total_started = (count= n()), .groups = "drop") %>%
arrange(desc(total_started)) %>%
slice_head(n = 10)
knitr::kable(top_started_station,
caption = "Top 10 Start Station")
| start_station_name | total_started |
|---|---|
| Streeter Dr & Grand Ave | 60407 |
| DuSable Lake Shore Dr & Monroe St | 40221 |
| DuSable Lake Shore Dr & North Blvd | 36000 |
| Kingsbury St & Kinzie St | 35506 |
| Michigan Ave & Oak St | 35309 |
| Clark St & Elm St | 31556 |
| Clinton St & Washington Blvd | 30880 |
| Millennium Park | 29481 |
| Clinton St & Madison St | 29443 |
| Wells St & Concord Ln | 27756 |
top_ended_station <- ride_duration_clean %>%
group_by(end_station_name) %>%
drop_na(end_station_name) %>%
summarise(total_end = (count= n()), .groups = "drop") %>%
arrange(desc(total_end)) %>%
slice_head(n = 10)
knitr::kable(top_ended_station,
caption = "Top 10 End Station")
| end_station_name | total_end |
|---|---|
| Streeter Dr & Grand Ave | 61911 |
| DuSable Lake Shore Dr & North Blvd | 39453 |
| DuSable Lake Shore Dr & Monroe St | 38858 |
| Michigan Ave & Oak St | 35517 |
| Kingsbury St & Kinzie St | 35119 |
| Clinton St & Washington Blvd | 31075 |
| Clark St & Elm St | 30947 |
| Millennium Park | 30114 |
| Clinton St & Madison St | 29906 |
| Wells St & Concord Ln | 28128 |
p3b
p3a2
p5a
p5b
With Casual Rider have high intention in weekend and summers, we can create focus time marketing program. Few marketing program suggestion and implication:
Summer Classic Bike Deals (July - August)
In-App Notification at 20min
Annual member are our ambassador for our services, since they are the one who keep steady revenue and word-of -mouth recommendation for our services. Maintaining the growth and retention our loyal friend are a must. Don’t trick them to bought services they don’t need but a genuine celebration for their commitment to our services.
Below are targeted recommendations to streamline our day-to-day operations, grounded in the data patterns we observed.
Seasonal Maintenance Scaling
From May - October increasing in total rides, driving higher wear and
tear of our bikes. Without proper maintenance, bike downtime (due
maintenance) and service interruption will rise, harming user experience
and revenue.
Recommendation:
Partner with an external bike-service provider to supplement in-house
repairs during peak season.
First Ride On-boarding & Apps
Stability
Casual riders generate most “Under 1 min” ride outliers, which often
indicate accidental unlocks or confusion with the app interface. Misuse
increases operational costs (unreported rentals, city fines) and damages
the brand if users feel penalized for honest mistakes.
Recommendation:
Implement mandatory “How-to Ride” tutorial in-app after user scan their
1st ride, segment tutorial based on ride journey (start, before
rides-safety, docking, & billings)
Busiest Station Management
The top 10 busiest stations fall into two user persona—downtown commuter
hubs on weekdays and recreational hotspots on weekends. Stock imbalances
at key station frustrate users, leading to lost rides and diminished
trust in the network’s reliability.
Recommendation:
Deploy re-balancing van before morning peak time (6–9 AM and (3–8 PM)
aim for 80% bike availability during this peak time. Introducing a user
incentive program that rewards trips ending at lower-capacity stations
near busy stations.
Explore the full dashboard on Tableau Public.
This dashboard visualizes seasonal trends, user types, trend line revenue comparison (bike type vs revenue), busiest station heatmap and ride duration distribution map with interactive filters.