It’s my pleasure to announce the discharge of scikit-survival 0.24.0.
A spotlight of this launch the addition of
cumulative_incidence_competing_risks()
which implements a non-parameteric estimator of the cumulative incidence operate within the presence of competing dangers.
As well as, the discharge provides help for scikit-learn 1.6, together with the help for lacking values for
ExtraSurvivalTrees.
Evaluation of Competing Dangers
In classical survival evaluation, the main target is on the time till a particular occasion happens. If no occasion is noticed through the examine interval, the time of the occasion is taken into account censored. A typical assumption is that censoring is non-informative, which means that censored topics have the same prognosis to those that weren’t censored.
Competing dangers come up when every topic can expertise an occasion as a result of certainly one of $Ok$ ($Ok geq 2$) mutually unique causes, termed competing dangers. Thus, the prevalence of 1 occasion prevents the prevalence of different occasions. For instance, after a bone marrow transplant, a affected person would possibly relapse or die from transplant-related causes (transplant-related mortality). On this case, demise from transplant-related mortality precludes relapse.
The bone marrow transplant knowledge from Scrucca et al., Bone Marrow Transplantation (2007) consists of knowledge
from 35 sufferers grouped into two most cancers sorts: Acute Lymphoblastic Leukemia (ALL; coded as 0), and Acute Myeloid Leukemia (AML; coded as 1).
from sksurv.datasets import load_bmt
bmt_features, bmt_outcome = load_bmt()
ailments = bmt_features["dis"].cat.rename_categories(
{"0": "ALL", "1": "AML"}
)
ailments.value_counts().to_frame()
In the course of the follow-up interval, some sufferers would possibly expertise a relapse of the unique leukemia or die
whereas in remission (transplant associated demise).
The end result is outlined equally to straightforward time-to-event knowledge, besides that the occasion indicator specifies the kind of occasion, the place 0 all the time signifies censoring.
import pandas as pd
status_labels = {
0: "Censored",
1: "Transplant associated mortality",
2: "Relapse",
}
dangers = pd.DataFrame.from_records(bmt_outcome).assign(
label=lambda x: x["status"].substitute(status_labels)
)
dangers["label"].value_counts().to_frame()
| label | depend |
|---|---|
| Relapse | 15 |
| Censored | 11 |
| Transplant associated mortality | 9 |
The desk above reveals the variety of observations for every standing.
Non-parametric Estimator of the Cumulative Incidence Perform
If the objective is to estimate the likelihood of relapse, transplant-related demise is a competing threat occasion. Which means that the prevalence of relapse prevents the prevalence of transplant-related demise, and vice versa. We goal to estimate curves that illustrate how the chance of those occasions adjustments over time.
Let’s start by estimating the likelihood of relapse utilizing the complement of the Kaplan-Meier estimator. With this strategy, we deal with deaths as censored observations. One minus the Kaplan-Meier estimator offers an estimate of the likelihood of relapse earlier than time $t$.
import matplotlib.pyplot as plt
from sksurv.nonparametric import kaplan_meier_estimator
instances, km_estimate = kaplan_meier_estimator(
bmt_outcome["status"] == 1, bmt_outcome["ftime"]
)
plt.step(instances, 1 - km_estimate, the place="submit")
plt.xlabel("time $t$")
plt.ylabel("Likelihood of relapsing earlier than time $t$")
plt.ylim(0, 1)
plt.grid()
Nevertheless, this strategy has a big disadvantage: contemplating demise as a censoring occasion violates the belief that censoring is non-informative. It is because sufferers who died from transplant-related mortality have a distinct prognosis than sufferers who didn’t expertise any occasion. Subsequently, the estimated likelihood of relapse is commonly biased.
The cause-specific cumulative incidence operate (CIF) addresses this drawback by estimating the cause-specific hazard of every occasion individually. The cumulative incidence operate estimates the likelihood that the occasion of curiosity happens earlier than time $t$, and that it happens earlier than any of the competing causes of an occasion. Within the bone marrow transplant dataset, the cumulative incidence operate of relapse signifies the likelihood of relapse earlier than time $t$, provided that the affected person has not died from different causes earlier than time $t$.
from sksurv.nonparametric import cumulative_incidence_competing_risks
instances, cif_estimates = cumulative_incidence_competing_risks(
bmt_outcome["status"], bmt_outcome["ftime"]
)
plt.step(instances, cif_estimates[0], the place="submit", label="Complete threat")
for i, cif in enumerate(cif_estimates[1:], begin=1):
plt.step(instances, cif, the place="submit", label=status_labels[i])
plt.legend()
plt.xlabel("time $t$")
plt.ylabel("Likelihood of occasion earlier than time $t$")
plt.ylim(0, 1)
plt.grid()
The plot reveals the estimated likelihood of experiencing an occasion at time $t$ for each the person dangers and for the full threat.
Subsequent, we need to to estimate the cumulative incidence curves for the 2 most cancers sorts — acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML) — to look at how the likelihood of relapse depends upon the unique illness prognosis.
_, axs = plt.subplots(2, 2, figsize=(7, 6), sharex=True, sharey=True)
for j, illness in enumerate(ailments.distinctive()):
masks = ailments == illness
occasion = bmt_outcome["status"][mask]
time = bmt_outcome["ftime"][mask]
instances, cif_estimates, conf_int = cumulative_incidence_competing_risks(
occasion,
time,
conf_type="log-log",
)
for i, (cif, ci, ax) in enumerate(
zip(cif_estimates[1:], conf_int[1:], axs[:, j]), begin=1
):
ax.step(instances, cif, the place="submit")
ax.fill_between(instances, ci[0], ci[1], alpha=0.25, step="submit")
ax.set_title(f"{illness}: {status_labels[i]}", measurement="small")
ax.grid()
for ax in axs[-1, :]:
ax.set_xlabel("time $t$")
for ax in axs[:, 0]:
ax.set_ylim(0, 1)
ax.set_ylabel("Likelihood of occasion earlier than time $t$")
The left column reveals the estimated cumulative incidence curves (strong strains) for sufferers identified with ALL, whereas the best column reveals the curves for sufferers identified with AML, together with their 95% pointwise confidence intervals. The plot signifies that the estimated likelihood of relapse at $t=40$ days is greater than 3 times greater for sufferers identified with ALL in comparison with AML.
If you wish to run the examples above your self, you may
execute them interactively in your browser utilizing binder.

