Introduction on multi-state models for survival analyses

Charlotte Voinot (PhD CIFRE INRIA / Sanofi) supervised by Dr Julie Josse (INRIA), Dr Bernard Sebastien (Sanofi)

2024-01-23

Multi-state model for survival analysis

Introduction of multi-state model
- Presentation and examples
- Data presentation
- Motivation
- Reminder of context and terminology in survival analysis
Multi-state model
- Assumption for multi-state analysis
- Quantities in multistate model (in non-Homogeneous Markov model)
- Must known estimators (in non-Homogeneous Markov model)
Application to three states data
- Usual survival analysis (summarise information : death/progression as unwanted event)
- Multi-state analysis
  - Nelson-Aalen estimator : Cumulative hazard function
  - Aalen-Johansen estimator : Probability of transition
  - Prediction of probability of transition : Cox model
- Comparison of the results between the two methods
What if the markov assumption is not respected ?
- Semi-markov model
Conclusion

Introduction of multi-state model

Classical survival analysis refers to the analysis of time to event (longitudinal data). The response is often referred as a failure time, survival time, or event time.

Multi-state models referred to multiple events or states for an individual over the time.

It is used to describe the dynamic of transitions between states (different events).
The study of these models consists in analyzing the transition forces (transition intensities) between the different states.

Obviously, the events cannot be seen as independent.

Presentation and examples

Example graphs corresponding to the survival model, the 2-state competing risk model and the illness-death model.

The survival model is the classical case where the individual can only move from the state 1 to the state 2.
The competing riks model is a special case of the multi-state model. Each transition leads to an absorbing state.
One exemple mainly used in epidemology is the Illness-death model. In our example, the state 2 is the diseased state and the state 3 is the death state.

In general multi-state models, the connections can be arbitrary or even cycle.

Presentation and examples

In this presentation, we will focus on the illness-death model :

Data presentation

Exemple of a dataset.

The dataset contains 6 variables : USUBJID (patient ID), TRTAN (treatment), OS (overall survival), OS.status (status indicator for OS), TTP (time to progression), TTP.status (status indicator for TTP). It contains 299 patients.

Initial data
Transition matrix
Data long format

USUBJID	TRTAN	OS	OS.status	TTP	TTP.status
015246-036-0001-00001	2	1430	0	376	0
015246-036-0001-00002	2	535	1	313	1

The value of the matrix corresponds to the name of the transition.

	basal	progression	death
basal	NA	1	2
progression	NA	NA	3
death	NA	NA	NA

Then, we need to transform the data in a long format.

In the long format, each row corresponds to a transition. It exists also some convention for the time and the status.

USUBJID	from	to	trans	Tstart	Tstop	time	status	TRTAN
015246-036-0001-00001	1	2	1	0	376	376	0	2
015246-036-0001-00001	1	3	2	0	376	376	0	2
015246-036-0001-00002	1	2	1	0	313	313	1	2
015246-036-0001-00002	1	3	2	0	313	313	0	2
015246-036-0001-00002	2	3	3	313	535	222	1	2

Data presentation

USUBJID	TRTAN	OS	OS.status	TTP	TTP.status
015246-036-0001-00002	2	535	1	313	1
015246-036-0001-00003	2	1373	0	77	1

Here’s the illustration on how to read the data (by convention of the long format) :

Motivation

Extension of traditional binary outcomes can’t fully describe the dynamic of the disease :
- Traditional analysis consists in summarising information : death/progression as unwanted event
- The resulted analysis is a static analysis : the dynamic of the disease is not taken into account
The multistate model can investigate the dynamic of transitions between states :
- It can introduce more complex model : certain states can be visited multiple times (multiple remissions)
- Introduce a more granular analysis : probability of being in a state at a given time, probability of moving from one state to another, hazard ratio of a treatment for each transition, etc…

Reminder of context and terminology in survival analysis

X is defined as the random variable representing survival time
C is the time to censoring

For each individual, we observe: $T_{i}=X_{i} \bigcap C_{i}$ and $\delta_{i}=1_{X_{i} \leq C_{i}}$ with $\delta_{i}$ the indicator of censoring (where $\delta_{i}=1$ if $T_{i}=X_{i}$ and $\delta_{i}=0$ if $T_{i}=C_{i}$).

We can define function of interest as :

The survival function: $S(t)=P(X>t)$.
The distribution function: $F(t)=P(X \leq t)=1-S(t)$.
The derivative of the distribution function: $f(t)=lim_{h \to 0} \frac{P(t \leq X \leq t+h)}{h}$
The instantaneous hazard function: $\lambda(t)=\lim_{h \to 0} \frac{P(t \leq X \leq t+h | X \geq t)}{h}$
The cumulative hazard function: $\Lambda(t)=\int_{0}^{t} \lambda(u) du = -\ln(S(t))$.

Each of these function remains valid in the multi-state context. We just need to specify the state of interest (from x to y).

Multi-state model

Assumptions for multi-state analysis

The previous notations for multi-state analysis is the basis for all multi-state model.

The estimators can be estimated based on the counting process properties.

It exists some assumptions to simplify the calculations of these estimators :

Quantities in multistate model (in non-Homogeneous Markov model)

Presentation of the quantities of interest in multi-state model (transition dependent quantities):

The instantaneous hazard function ($\alpha$ instead of $\lambda$): $\alpha_{hj}(t)$
The cumulative hazard function (A instead of $\Lambda$): $A_{hj}(t)=\int_{0}^{t} \alpha_{hj}(u)du$
The probability of transition : $P_{hj}(s,t)= P(X_{t}=j | X_{s}=h)$ (probability of being in the state j at time t knowing that the individual was in the state h at time s)

How to find the same quantities than in usual survival ?

Usual survival analysis

Probability of progression free survival time (Time until a progression or death) : $P(X>t)$ (ex : Kaplan meier estimator on summarized information)
Probability of being still alive at time t : $P(X>t)$ (ex : Kaplan meier estimator on overall survival time)

Multi-state analysis

The probability of staying in state 1 : $P_{11}(t)$
The opposite probability of death in being in state 1 : $1- P_{13}(t)$

Must known estimators (in non-Homogeneous Markov model)

The following estimators are a generalization to Markov models estimators.
Reminder : non-homogeneous Markov model is a Markov model where the instantaneous hazard function depends only of the time of follow-up ($\alpha_{hj}(t,d)=\alpha_{hj}(t)$).

Non parametric estimators
Semi parametric estimators

Non parametric estimators can be seen as purely descriptive. Covariates can’t be included in the estimators and prediction is not possible.

Function of interest	Estimator	Simple survival model	Multi-state model (by transition)
Cumulative hazard function	Nelson-Aalen	$\hat{\Lambda}(t)=\sum_{i:t_{j} \leq t} (\frac{d_{i}}{n_{i}})$	$\hat{A}_{hj}(t)=\int_{0}^{t}\frac{J_{h}(u)}{Y_{h}(u)}dN_{hj}(u)$ [1]
Probability of transition	Aalen-Johansen	no equivalent	$\hat{P}(s,t)=\prod_{l:t_{l} \leq t} (Id + \Delta \hat{A}(T_{l}))$ [2]

Note : In the following examples, we will consider that the censoring is right censoring and that the censoring is i.i.d (independent and identically distributed) and not informative.

[1] Details and demonstrations in appendix
[2] Goodman and Johansen (1973)

Focus on some non parametric and semi parametric estimators only (parametric estimators are not presented).

In the slide, we present the estimators in simple survival model and in multi-state model (by transition) to do the parrallel between the two models.

Also all estimators in simple survival model can be expressed in multi-state model (by transition) but not the opposite.

Nelson-aalen : it’s the sum of the hazard function at each death time. Multi-state : it’s the integral of the hazard function at each transition time.

dN is the number of transition from state h to state j at time u. Y is the number of individual at risk at time u. J is the number of individual at risk at time u minus.

Probability of transition : In the simple survival model, the probability of transition is not defined. The parallel can be done with the opposite of Kaplan meier estimator (the probability of having the event before the time t).

In the multi-state model, the probability of transition is defined as the probability of being in the state j at time t knowing that the individual was in the state h at time s. It is the integral product of the matrix identity and the difference of the cumulative hazard function at each transition time.

Some strong assumptions are necessary: $\alpha_{hj,i}(t)=\alpha_{hj,0}(t)exp(\beta^{T}Z_{i})$ (proportional risk model). It is the same assumption as the classical Cox model.

Semi parametric estimators enable to include covariates in the estimators. Prediction is possible.

Function of interest	Estimator	Simple survival model	Multi-state model (by transition)
The baseline cumulative hazard function	Breslow	$\Lambda_{0}(t)=\sum_{i:T_{i} \leq t} \frac{d_{i}}{\sum_{j \in R(T_{i})} exp(\hat{\beta'}*Z_{j})}$	$A_hj0(t)=\int_0^t \frac{J_h(u)}{Y_h(u)exp(\beta^T_{hj}Z_{i})}dN_hj(u)$
Regression coefficient $\beta$	Cox model	maximum partial likelihood	maximum partial likelihood
Probability of transition	Cox model	no equivalent	$\hat{P}(s,t\|Z)=\prod_{l:t_{l} \leq t} (Id + \Delta \hat{A}(T_{l}\|Z))$

WARNING : The semi-parametric assumption have to be checked (Graphical method, Schoenfeld residuals, etc.)

Application to three states data

USUBJID	TRTAN	OS	OS.status	TTP	TTP.status
015246-036-0001-00001	2	1430	0	376	0
015246-036-0001-00002	2	535	1	313	1
015246-036-0001-00003	2	1373	0	77	1

Usual survival analysis (summarise information : death/progression as unwanted event)

In usual survival analysis, the study considers :

Primary endpoint :

Progression-free survival (PFS) time defined by the time from randomization until disease progression or death from any cause.

Secondary endpoint :

Overall survival time from randomization until death from any cause.

In order to compare the multi-state model with the classical survival analysis, we will first analyse the data with the classical survival analysis.

First, we will summarise the data : death and progession as unwanted event.

```{r}
#| code-line-numbers: "5-8"
datable$status <- ifelse(datable$OS.status == 1 | datable$TTP.status == 1, 1, 0)

# if OS < TTP and OS.status =1 and TTP.status =1, then time = OS
# if OS.status = 1 and TTP.status = 0, then time = OS
# if OS.status = 0 and TTP.status = 1, then time = TTP
# if OS.status = 0 and TTP.status = 0, then time = min(OS, TTP)
datable$time <- ifelse(datable$OS.status == 1 & datable$TTP.status == 1, pmin(datable$OS,datable$TTP), ifelse(datable$OS.status == 1 & datable$TTP.status == 0, datable$OS, ifelse(datable$OS.status == 0 & datable$TTP.status == 1, datable$TTP, ifelse(datable$OS.status == 0 & datable$TTP.status == 0, pmin(datable$OS, datable$TTP), NA))))
datable$OS <- datable$OS/30
datable$TTP <- datable$TTP/30
datable$time <- datable$time/30
```

Primary endpoint : progression-free survival (PFS)

Progression free survival : time until a progression or death

```{r}
# Kaplan meier of proportion of surviving without progression 
km <- survfit(Surv(time, status) ~ strata(TRTAN), data = datable)
km 
```

Call: survfit(formula = Surv(time, status) ~ strata(TRTAN), data = datable)

                        n events median 0.95LCL 0.95UCL
strata(TRTAN)=TRTAN=1 122     86   19.3    16.3    25.3
strata(TRTAN)=TRTAN=2 177     98   31.6    25.1    43.0

The median follow-up time for progression-free survival was 19.3 months (95% CI: 16.3-25.3) in the treatment group 1 and 31.6 months (95% CI: 25.1-43) in the treatment group 2.

Primary endpoint : progression-free survival (PFS)

After checking the proportional risk hypothesis, we can calculate the hazard ratio :

```{r}
c1 <- coxph(Surv(time,status)~ TRTAN,data=datable)
c1

# CI for HR 95%
exp(confint(c1,level=0.95))
```

Call:
coxph(formula = Surv(time, status) ~ TRTAN, data = datable)

         coef exp(coef) se(coef)      z        p
TRTAN -0.5427    0.5812   0.1492 -3.637 0.000276

Likelihood ratio test=12.94  on 1 df, p=0.0003216
n= 299, number of events= 184 
          2.5 %    97.5 %
TRTAN 0.4337851 0.7786188

The hazard ratio for the treatment 1 vs treatment 2 is 0.58 (95% CI: 0.43-0.78).
The treatment 2 is associated with a lower risk of progression or death (p-value : 0.0003) : The PFS time decreases by 42% with the treatment 2 compared to the treatment 1.

Secondary endpoint : overall survival

Overall survival : Time from randomization until death from any cause

```{r}
c2 <- coxph(Surv(OS,OS.status)~ TRTAN,data=datable)
c2
```

Call:
coxph(formula = Surv(OS, OS.status) ~ TRTAN, data = datable)

         coef exp(coef) se(coef)      z     p
TRTAN -0.2618    0.7696   0.1855 -1.411 0.158

Likelihood ratio test=1.97  on 1 df, p=0.1602
n= 299, number of events= 117

The overall survival seems to be identical between the two treatments graphically.
It is confirmed by the HR : The effect of the treatment on the overall survival is not significant (p-value : 0.16).

Conclusion on the usual survival analysis

The treatment 2 is associated with a lower risk of progression or death (p-value : 0.0003) : The PFS time decreases by 42% with the treatment 2 compared to the treatment 1.

The effect of the treatment on the overall survival is not significant (p-value : 0.16).

It can be resumed by the following graph :

Multi-state analysis (Non homogeneous Markov model)

For computing the estimators in the context of non-homogeneous Markov model, the packages mstate, msSurv and etm can be used.

Markov assumption

Before analysing the data with the multi-state model, we need to check the assumptions of the markov multi-state model in using the package mstate.

It exists some functions which check the assumptions of the markov multi-state model in applying log-rank test for each transition (recommanded in Titman and Putter (2020) detailed in appendix).

The p-value corresponding to the overall chi-test for transition 1 (from 1 to 2) :

[1] 0

p-value for transition 2 (from 1 to 3) :

[1] 0

p-value for transition 3 (from 2 to 3) :

[1] 0.141

In our case, the p-value for the transition 3 is not significant (p-value : 0.15). Thus, the markov assumption is not respected. It exists alternative models to the markov model such as semi-markov model …

Nelson-Aalen estimator : Cumulative hazard function

```{r}
fit_aj <- coxph(Surv(Tstart,Tstop,status)~strata(trans), id=id, data = datable_long_transformed, method="breslow")
msf0 <- msfit(object=fit_aj, vartype="aalen",trans=tmat)
```

Reminder : Nelson-Aalen is an estimator for the cumulative hazard function and can be expressed as $\hat{A}_{hj}(t)=\int_{0}^{t}\frac{J_{h}(u)}{Y_{h}(u)}dN_{hj}(u)$

Thanks to this estimator, the probability of transition can be computed by Aalen-Johansen estimator [1].

[1] Details and demonstration in appendix

Aalen-Johansen estimator : Probability of transition

Non-parametric estimator for the probability of transition. It can be expressed as $P_{hj}(s,t)= P(X_{t}=j | X_{s}=h)$.

Probability from state 1
Probability from state 2
Probability from state 3

What the plot shows is $P_{1-j}(0,t)$, the probability of being in state 1 at time 0 and in state j at time t :

$p_{11}$ is the blue curve
$p_{12}$ is the green curve
$p_{13}$ is the red curve

What the plot shows is $P_{2-j}(0,t)$, the probability of being in state 2 at time 0 and in state j at time t :

$p_{21}$ is the blue curve
$p_{22}$ is the green curve
$p_{23}$ is the red curve

What the plot shows is $P_{3-j}(0,t)$, the probability of being in state 3 at time 0 and in state j at time t :

$p_{33}$ is the red curve

Prediction of probability of transition : Cox model

Semi-parametric estimator of the probability of transition. It enables prediction.

The treatment is the only covariate included in the model. The covariates have to be in long format also with 1 information per transition : TRTAN2.1 (logical value on treatment 2 in transition 1), TRTAN2.2, TRTAN2.3.

	id	trans	TRTAN	TRTAN2.1	TRTAN2.2	TRTAN2.3
1	015246-036-0001-00001	1	2	1	0	0
2	015246-036-0001-00001	2	2	0	1	0
5	015246-036-0001-00002	3	2	0	0	1

```{r}
fit_cox <- coxph(Surv(Tstart,Tstop,status)~ TRTAN2.1+TRTAN2.2+TRTAN2.3 +strata(trans), id=id, data = datable_long_transformed, method="breslow")
fit_cox
#exp(confint(fit_cox,level=0.95))
```

Call:
coxph(formula = Surv(Tstart, Tstop, status) ~ TRTAN2.1 + TRTAN2.2 + 
    TRTAN2.3 + strata(trans), data = datable_long_transformed, 
    method = "breslow", id = id)

            coef exp(coef) se(coef) robust se      z        p
TRTAN2.1 -0.6139    0.5412   0.1723    0.1712 -3.586 0.000335
TRTAN2.2 -0.3274    0.7208   0.3017    0.3023 -1.083 0.278768
TRTAN2.3  0.1465    1.1577   0.2397    0.2238  0.654 0.512822

Likelihood ratio test=13.99  on 3 df, p=0.002915
n= 735, number of events= 254

The p-value (p) must be less than 0.05 to admit the significance of a coefficient.

Prediction of probability of transition : Cox model

To help our understanding :

Transition	Coefficient regression for Treatment 2	HR (CI 95%)	p-value for HR
1	-0.6139 (0.17)	0.54 (0.38-0.75)	`0.0003`
2	-0.3274 (0.17)	0.72 (0.40-1.30)	0.28
3	0.1465 (0.17)	1.15 (0.75-1.8)	0.52

In using this method, coefficient regression then HR (exp(coef)) are computed for each transition.

For transition 1 (from initial state to progression): The hazard ratio for the treatment 1 vs treatment 2 is 0.54 (95% CI: 0.38-0.75). The treatment 2 is significantly (p-value : 0.0003) associated with a lower risk of progression (reduction of 46% of having a progression).

For transition 2 (from initial state to death): The hazard ratio for the treatment 1 vs treatment 2 is 0.72 (95% CI: 0.40-1.30). The treatment 2 is not significantly (p-value=0.28) associated with a lower risk of death from the initial state.

For transition 3 (from progression to death) : The hazard ratio for the treatment 1 vs treatment 2 is 1.15 (95% CI: 0.75-1.8). The treatment 2 is not significantly (p-value=0.52) associated with a lower risk of death from progression state.

Prediction of probability of transition : Cox model

```{r}
#| code-line-numbers: "3-6"
#traitement 2
newd2 <- data.frame(TRTAN2.1 = c(1,0,0),TRTAN2.2 = c(0,1,0),TRTAN2.3 = c(0,0,1), trans=1:3,strata=1:3)
#traitement 1
newd1 <- data.frame(TRTAN2.1 = c(0,0,0),TRTAN2.2 = c(0,0,0),TRTAN2.3 = c(0,0,0), trans=1:3,strata=1:3)
```

Treatment 1

Treatment 2

The plot shows the prediction probability of transition in having the treatment 1 from the initial state.

The plot shows the prediction probability of transition in having the treatment 2 from the initial state.

It confirms visually the results of the cox model : the treatment 2 gives a higher probability of staying in the initial state (basal) than the treatment 1.

Example of question

Example of question : what is the probability of death for patient having treatment 2 and a progression at time 10 months ?

The answer

Basically, the answer of the question is $P_{2-3}(10,t)$, the probability of being in state 2 (prgression) at time 10 and in state 3 (death) at time t :

$p_{21}$ is the blue curve
$p_{22}$ is the green curve

$p_{23}$ is the red curve

Comparision usual and multi-state analysis

PFS for multi-state analysis : $p_{11}$

Overall survival for multi-state analysis : 1-$p_{13}$

PFS for usual survival analysis :

Overall survival for usual survival analysis :

What if the markov assumption is not respected ?

The markov assumption is very strong.
In some case, the validity of markov assumption can be discussed :

When a patient is in a progression state (state 2), the probability of death could be different if the patient has a progression at time 1 month or at time 20 months.
In a more complex setting (with possible multiple remission, progression, etc.), the markov assumption cannot take into account the number of total progression for the probability of transition to death.

Semi-markov model can be set as an alternative model to the markov model.

Semi-markov model

In the simple case (homogeneous semi-markov model), the quantities to compute are different :

$J_{n}$ is a Markov chain
$X_{n}=S_{n}-S_{n-1}$ is the time of stay in the state $J_{n-1}$

The probability of transition of the Markov chain can be expressed as : $P_{hj}=P(J_{n+1}=j|J_{n}=h)$
The distribution of $X_{n}$ is the same for all $n$ and is called the sojourn time distribution, can be expressed as $F_{hj}(d)=P(X_{n+1} \leq d | J_{n}=h, J_{n+1}=j)$

These two quantities are used to compute the intensity of transition then the probability of transition.

The package 'SemiMarkov' can be used to compute semi-markov model.

Conclusion : What is the advantage in using multi-state analysis ?

The multi-state model enables to analyse the dynamic of the disease :
- probability of being in a state at a given time
- probability of moving from one state to another
- Hazard ratio (for the treatment) for each transition : detailed analysis of the effect of the treatment on the disease
In our case, we have seen :

Instead of

Finally each usual survival analysis can be done with the multi-state model.

Some articles have shown under some constraint that multi-state model enables to have more statistical power than usual survival analysis (see Smith, Nixon, and Sharples (2021) and Cassarly et al. (2017)).

References

Andersen, P. K., Ø. Borgan, R. D. Gill, and N. Keiding. 1996. “Statistical Models Based on Counting Processes.” Journal of the Royal Statistical Society. Series D (The Statistician) 45 (3): 384–84. http://www.jstor.org/stable/2988475.

Cassarly, Christy, Renee’ H. Martin, Marc Chimowitz, Edsel A. Peña, Viswanathan Ramakrishnan, and Yuko Y. Palesch. 2017. “Assessing Type I Error and Power of Multistate Markov Models for Panel Data—A Simulation Study.” Communications in Statistics - Simulation and Computation 46 (9): 7040–61. https://doi.org/10.1080/03610918.2016.1222425.

Fleming, T. R., and D. P. Harrington. 2013. Counting Processes and Survival Analysis. Wiley Series in Probability and Statistics. Wiley. https://books.google.fr/books?id=vam6EAAAQBAJ.

Goodman, Gerald S., and Søren Johansen. 1973. “Kolmogorov’s Differential Equations for Non-Stationary, Countable State Markov Processes with Uniformly Continuous Transition Probabilities.” Mathematical Proceedings of the Cambridge Philosophical Society 73: 119–38. https://api.semanticscholar.org/CorpusID:123282437.

SAINT PIERRE, Philippe. 2021. “Analyse de Survie, Modèles Multi-États Et Processus de Comptage.” https://perso.math.univ-toulouse.fr/psaintpi/files/2021/04/Cours_Survie_2.pdf.

Smith, Isabelle L., Jane E. Nixon, and Linda Sharples. 2021. “Power and Sample Size for Multistate Model Analysis of Longitudinal Discrete Outcomes in Disease Prevention Trials.” Statistics in Medicine 40 (8): 1960–71. https://doi.org/https://doi.org/10.1002/sim.8882.

Titman, Andrew C, and Hein Putter. 2020. “General tests of the Markov property in multi-state models.” Biostatistics 23 (2): 380–96. https://doi.org/10.1093/biostatistics/kxaa030.

Appendix

Definition of the integral product

Whereas the ordinary integral of a function a provides a solution of the equation :

\[ y'(x) = a(x) \]

The product integral helps us to find solutions of the equation : \[ y'(x) = a(x)y(x) \]

The integral product is introduced by Voltera in 1887 :

$\prod_{s \leq u \leq t} (Id + d\hat{A}(u)) = \lim_{max |t_{i}-t_{i-1}| \to 0 } (Id + \hat{A}(T_{i})-\hat{A}(T_{i-1}))$

Probability of transition

The probability of transition of a Markov process verify :

\[ \begin{aligned} & \forall i, j \in S=\{1, \ldots, k\} \text { et } \forall 0<s<u<t, \\ & p_{h j}(s, t)=\sum_{k \in S} p_{h k}(s, u) p_{k j}(u, t), \end{aligned} \]

Thus, the probability of transition can be expressed by the Kolmogorov forward equation :

\[ \frac{\delta P_{hj}(s,t)}{\delta t} = P(s,t)*A(dt) \]

The derivate of the probability of transition is equal to the probability of transition multiplied by the infinitesimal cumulative hazard matrix.

Notations for multi-state

We need to introduce other notations for multi-state analysis :

$N_{hj}(t)=\sum_{i:1}^{n}1_{X_{i} \leq t}$ is the number of transitions between the state h and j that have occurred at time t for all patients. $N_{hj}(t)$ is a right statusored counting process.

Thanks to some counting process properties : \[ N_{hj}(t)=\Delta_{hj}(t)+M_{hj}(t) \]
with \[ \Delta_{hj}(t)=\int_{0}^{t} \lambda_{hj}(u)du = \int_{0}^{t} \alpha_{hj}(u) Y_{h}(u)du \] and $M_{hj}(t)$ a local martingale (random noise with $E(M)=0$) SAINT PIERRE (2021).

$\lambda$ is the intensity of the counting process $N$ (Property)
$Y_{h}(t)=\sum_{i:1}^{n}1_{X_{i} \geq t}$ (the number of patients at risk in the state h at time t)

WARNING : $\Delta_{hj}(t)$ is not the cumulative hazard function.

NOTE : $\alpha_{hj}(t)$ can be seen as the instantaneous hazard function of the transition from the state h to the state j at time t ($\alpha_{hj}(s)=\lim_{\Delta t \to 0} \frac{p_{hj}(s, s+\Delta t )}{\Delta t}$).

Detail for notation

Thus (appendix), thanks to Doob-Meyer decomposition, we can decompose the counting process as a sum of a predictable process ($\Delta$) and a martingale ($M$) : $N_{hj}(t)=\Delta_{hj}(t)+M_{hj}(t)$

with $\Delta_{hj}(t)=\int_{0}^{t} \lambda_{hj}(u)du = \int_{0}^{t} \alpha_{hj}(u) Y_{h}(u)du$,
with $\lambda$ the intensity of the counting process $N$ (Property), $Y_{h}(t)=\sum_{i:1}^{n}1_{X_{i} \geq t}$ (the number of patients at risk in the state h at time t) and $M_{hj}(t)$ a local martingale (random noise with $E(M)=0$).

WARNING : $\Delta_{hj}(t)$ is not the cumulative hazard function.

Note : $\alpha_{hj}(t)$ can be seen as the instantaneous hazard function of the transition from the state h to the state j at time t ($\alpha_{hj}(s)=\lim_{\Delta t \to 0} \frac{p_{hj}(s, s+\Delta t )}{\Delta t}$).

Demonstration for estimators

Cumulative hazard function : Nelson-aalen estimator

Assumption : Non-homogeneous markov model (the instantaneous hazard function depends only of the time of follow-up).

By definition :

1- the counting process is equal to $N(t)=\Delta(t)+M(t)$ with $\Delta(t)=\int_{0}^{t}\alpha(u)Y(u)du$.

2- $M(t)$ is a local martingale thus $E(M(t))=0$ and considered as random noise.

3- the cumulative hazard function is $A(t)=\int_{0}^{t}\alpha(u)du$ with $\alpha(u)$ the instantaneous hazard function.

\[ \begin{aligned} N(t) &=\Delta(t)+M(t) \text{ by (1)} \\ \Rightarrow \frac{dN(t)}{dt} &=\frac{d\Delta(t)}{dt}+\frac{dM(t)}{dt} \\ \Rightarrow dN(t) &=\alpha(t)Y_{h}(t)dt+dM(t) \\ \Rightarrow \alpha(t) &=(dN(t)-dM(t))*\frac{1}{Y(t)} \text{ by (2) as M is a random noise}\\ \Rightarrow A(t) &= \int_{0}^{t}\frac{1}{Y(t)}*dN(t) \text{ by (3)}\\ \Rightarrow A(t) &= \int_{0}^{t}\frac{J(t)}{Y(t)}*dN(t) \text{ with $J(t)=1_{Y(t) \geq 0}$}\\ \end{aligned} \]

Function of interest	Estimator	Simple survival model	Multi-state model (by transition)
Cumulative hazard function	Nelson-Aalen	\(\hat{\Lambda}(t)=\sum_{i:t_{j} \leq t} (\frac{d_{i}}{n_{i}})\)	\(\hat{A}_{hj}(t)=\int_{0}^{t}\frac{J_{h}(u)}{Y_{h}(u)}dN_{hj}(u)\) [1]
Probability of transition	Aalen-Johansen	no equivalent	\(\hat{P}(s,t)=\prod_{l:t_{l} \leq t} (Id + \Delta \hat{A}(T_{l}))\) [2]

Function of interest	Estimator	Simple survival model	Multi-state model (by transition)
The baseline cumulative hazard function	Breslow	\(\Lambda_{0}(t)=\sum_{i:T_{i} \leq t} \frac{d_{i}}{\sum_{j \in R(T_{i})} exp(\hat{\beta'}*Z_{j})}\)	\(A_hj0(t)=\int_0^t \frac{J_h(u)}{Y_h(u)exp(\beta^T_{hj}Z_{i})}dN_hj(u)\)
Regression coefficient \(\beta\)	Cox model	maximum partial likelihood	maximum partial likelihood
Probability of transition	Cox model	no equivalent	\(\hat{P}(s,t\|Z)=\prod_{l:t_{l} \leq t} (Id + \Delta \hat{A}(T_{l}\|Z))\)