Kaplan-Meier生存分析
·
xiebro
Kaplan-Meier生存分析:探索客户的生存概率随时间的变化,并可视化生存曲线,显示客户在不同时间点的生存概率。
数据集下载:Telco-Customer-Churn.csv
数据集概览
library(tidyverse)
dat <-
read.csv("datasets/Telco-Customer-Churn.csv") %>%
mutate(churn = if_else(Churn == "Yes", 1L, 0L))
dat %>% glimpse()
## Rows: 7,043
## Columns: 22
## $ customerID <chr> "7590-VHVEG", "5575-GNVDE", "3668-QPYBK", "7795-CFOCW…
## $ gender <chr> "Female", "Male", "Male", "Male", "Female", "Female",…
## $ SeniorCitizen <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ Partner <chr> "Yes", "No", "No", "No", "No", "No", "No", "No", "Yes…
## $ Dependents <chr> "No", "No", "No", "No", "No", "No", "Yes", "No", "No"…
## $ tenure <int> 1, 34, 2, 45, 2, 8, 22, 10, 28, 62, 13, 16, 58, 49, 2…
## $ PhoneService <chr> "No", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "No", …
## $ MultipleLines <chr> "No phone service", "No", "No", "No phone service", "…
## $ InternetService <chr> "DSL", "DSL", "DSL", "DSL", "Fiber optic", "Fiber opt…
## $ OnlineSecurity <chr> "No", "Yes", "Yes", "Yes", "No", "No", "No", "Yes", "…
## $ OnlineBackup <chr> "Yes", "No", "Yes", "No", "No", "No", "Yes", "No", "N…
## $ DeviceProtection <chr> "No", "Yes", "No", "Yes", "No", "Yes", "No", "No", "Y…
## $ TechSupport <chr> "No", "No", "No", "Yes", "No", "No", "No", "No", "Yes…
## $ StreamingTV <chr> "No", "No", "No", "No", "No", "Yes", "Yes", "No", "Ye…
## $ StreamingMovies <chr> "No", "No", "No", "No", "No", "Yes", "No", "No", "Yes…
## $ Contract <chr> "Month-to-month", "One year", "Month-to-month", "One …
## $ PaperlessBilling <chr> "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes", "No", …
## $ PaymentMethod <chr> "Electronic check", "Mailed check", "Mailed check", "…
## $ MonthlyCharges <dbl> 29.85, 56.95, 53.85, 42.30, 70.70, 99.65, 89.10, 29.7…
## $ TotalCharges <dbl> 29.85, 1889.50, 108.15, 1840.75, 151.65, 820.50, 1949…
## $ Churn <chr> "No", "No", "Yes", "No", "Yes", "Yes", "No", "No", "Y…
## $ churn <int> 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,…
生存分析
1. 初步探查,整体生存趋势
library(survival) # 进行生存分析的函数和工具
library(survminer) # 用于生存分析结果的可视化
library(hrbrthemes) # 用于设置可视化主题
# Kaplan-Meier Analysis
km_fit <- survfit(Surv(tenure, churn) ~ 1, data = dat)
km_fit %>%
ggsurvplot(
data = dat,
linetype = "solid",
conf.int = FALSE,
ggtheme = theme_ipsum(),
xlab = "Time",
ylab = "Surv. Prob"
)

2. 维度探查,不同特征人群的生存趋势
封装函数,方便多次画图使用
pals <- rownames(RColorBrewer::brewer.pal.info)
sc <- . %>%
ggsurvplot(
data = dat,
linetype = "solid",
conf.int = FALSE,
pval = TRUE,
censor = FALSE,
palette = sample(pals, 1),
ggtheme = theme_ipsum(),
break.time.by = 14,
xlab = "Time",
ylab = "Surv. Prob"
)
- by gender:
gender_fit <- survfit(Surv(tenure, churn) ~ gender, data = dat)
sc(gender_fit)

- by PaymentMethod:
payment_method <- survfit(Surv(tenure, churn) ~ PaymentMethod, data = dat)
sc(payment_method)

3. 使用迭代法快速扫描维度
library(gridExtra)
# 需要扫描的维度
cols <-
c(
"PhoneService",
"TechSupport",
"StreamingTV",
"StreamingMovies"
)
plots <-
cols %>%
# 这里引入了meta programming的编程思想,用函数生成函数
map_chr( ~ sprintf("survfit(Surv(tenure, churn) ~ %s, data = dat)", .x)) %>%
map(~ eval(parse(text = .))) %>%
map(sc) %>%
# 从ggsurvplot对象中提取图片
map(pluck, "plot")
do.call(grid.arrange, plots)
