This morning, I was working with Julie, a student of mine, coming from Rennes, on mortality tables. Actually, we work on genealogical datasets from a small region in Québec, and we can observe a lot of volatiliy. If I borrow one of her graph, we get something like
Image may be NSFW.
Clik here to view.
Since we have some missing data, we wanted to use some Generalized Nonlinear Models. So let us see how to get a smooth estimator of the mortality surface. We will write some code that we can use on our data later on (the dataset we have has been obtained after signing a lot of official documents, and I guess I cannot upload it here, even partially).
DEATH <- read.table( "http://freakonometrics.free.fr/Deces-France.txt", header=TRUE) EXPO <- read.table( "http://freakonometrics.free.fr/Exposures-France.txt", header=TRUE,skip=2) library(gnm) D=DEATH$Male E=EXPO$Male A=as.numeric(as.character(DEATH$Age)) Y=DEATH$Year I=(A<100) base=data.frame(D=D,E=E,Y=Y,A=A) subbase=base[I,] subbase=subbase[!is.na(subbase$A),]
The first idea can be to use a Poisson model, where the mortality rate is a smooth function of the age and the year, something like
Image may be NSFW.
Clik here to view.that can be estimated using
library(mgcv) regbsp=gam(D~s(A,Y,bs="cr")+offset(log(E)),data=subbase,family=quasipoisson) predmodel=function(a,y) predict(regbsp,newdata=data.frame(A=a,Y=y,E=1)) vX=trunc(seq(0,99,length=41)) vY=trunc(seq(1900,2005,length=41)) vZ=outer(vX,vY,predmodel) persp(vZ,theta=-30,col="green",shade=TRUE,xlab="Ages (0-100)", ylab="Years (1900-2005)",zlab="Mortality rate (log)")
The mortality surface is here
Image may be NSFW.
Clik here to view.
It is also possible to extract the average value of the years, which is the interpretation of the Image may be NSFW.
Clik here to view. coefficient in the Lee-Carter model,
predAx=function(a) mean(predict(regbsp,newdata=data.frame(A=a, Y=seq(min(subbase$Y),max(subbase$Y)),E=1))) plot(seq(0,99),Vectorize(predAx)(seq(0,99)),col="red",lwd=3,type="l")
We have the following smoothed mortality rate
Image may be NSFW.
Clik here to view.
Recall that the Lee-Carter model is
Image may be NSFW.
Clik here to view.
where parameter estimates can be obtained using
regnp=gnm(D~factor(A)+Mult(factor(A),factor(Y))+offset(log(E)), data=subbase,family=quasipoisson)
predmodel=function(a,y) predict(regnp,newdata=data.frame(A=a,Y=y,E=1)) vZ=outer(vX,vY,predmodel) persp(vZ,theta=-30,col="green",shade=TRUE,xlab="Ages (0-100)", ylab="Years (1900-2005)",zlab="Mortality rate (log)")
The (crude) mortality surface is
Image may be NSFW.
Clik here to view.
with the following Image may be NSFW.
Clik here to view. coefficients.
plot(seq(1,99),coefficients(regnp)[2:100],col="red",lwd=3,type="l")
Image may be NSFW.
Clik here to view.
Here we have a lot of coefficients, and unfortunately, on a smaller dataset, we have much more variability. Can we smooth our Lee-Carter model ? To get something which looks like
Image may be NSFW.
Clik here to view.
Actually, we can, and the code is rather simple
library(splines) knotsA=c(20,40,60,80) knotsY=c(1920,1945,1980,2000) regsp=gnm(D~bs(subbase$A,knots=knotsA,Boundary.knots=range(subbase$A),degre=3)+ Mult(bs(subbase$A,knots=knotsA,Boundary.knots=range(subbase$A),degre=3), bs(subbase$Y,knots=knotsY,Boundary.knots=range(subbase$Y),degre=3))+ offset(log(E)),data=subbase, family=quasipoisson) BpA=bs(seq(0,99),knots=knotsA,Boundary.knots=range(subbase$A),degre=3) BpY=bs(seq(min(subbase$Y),max(subbase$Y)),knots=knotsY,Boundary.knots= range(subbase$Y),degre=3) predmodel=function(a,y) predict(regsp,newdata=data.frame(A=a,Y=y,E=1)) v Z=outer(vX,vY,predmodel) persp(vZ,theta=-30,col="green",shade=TRUE,xlab="Ages (0-100)", ylab="Years (1900-2005)",zlab="Mortality rate (log)")
The mortality surface is now
Image may be NSFW.
Clik here to view.
and again, it is possible to extract the average mortality rate, as a function of the age, over the years,
BpA=bs(seq(0,99),knots=knotsA,Boundary.knots=range(subbase$A),degre=3) Ax=BpA%*%coefficients(regsp)[2:8] plot(seq(0,99),Ax,col="red",lwd=3,type="l")
Image may be NSFW.
Clik here to view.
We can then play with the smoothing parameters of the spline functions, and see the impact on the mortality surface
knotsA=seq(5,95,by=5) knotsY=seq(1910,2000,by=10) regsp=gnm(D~bs(A,knots=knotsA,Boundary.knots=range(subbase$A),degre=3)+ Mult(bs(A,knots=knotsA,Boundary.knots=range(subbase$A),degre=3), bs(Y,knots=knotsY,Boundary.knots=range(subbase$Y),degre=3)) +offset(log(E)),data=subbase,family=quasipoisson) predmodel=function(a,y) predict(regsp,newdata=data.frame(A=a,Y=y,E=1)) vZ=outer(vX,vY,predmodel) persp(vZ,theta=-30,col="green",shade=TRUE,xlab="Ages (0-100)", ylab="Years (1900-2005)",zlab="Mortality rate (log)")
Image may be NSFW.
Clik here to view.
We now have to use those functions our our small data sample ! That should be fun….