Some plots to go with group based trajectory models in R

On my prior post on estimating group based trajectory models in R using the crimCV package I received a comment asking about how to plot the trajectories. The crimCV model object has a base plot object, but here I show how to extract those model predictions as well as some other functions. Many of these plots are illustrated in my paper for crime trajectories at micro places in Albany (forthcoming in the Journal of Quantitative Criminology). First we are going to load the crimCV and the ggplot2 package, and then I have a set of four helper functions which I will describe in more detail in a minute. So run this R code first.


long_traj <- function(model,data){
  df <- data.frame(data)
  vars <- names(df)
  prob <- model['gwt'] #posterior probabilities
  df$GMax <- apply(prob$gwt,1,which.max) #which group # is the max
  df$PMax <- apply(prob$gwt,1,max)       #probability in max group
  df$Ord <- 1:dim(df)[1]                 #Order of the original data
  prob <- data.frame(prob$gwt)
  names(prob) <- paste0("G",1:dim(prob)[2]) #Group probabilities are G1, G2, etc.
  longD <- reshape(data.frame(df,prob), varying = vars, v.names = "y", 
                   timevar = "x", times = 1:length(vars), 
                   direction = "long") #Reshape to long format, time is x, y is original count data
  return(longD)                        #GMax is the classified group, PMax is the probability in that group

weighted_means <- function(model,long_data){
  G_names <- paste0("G",1:model$ng)
  G <- long_data[,G_names]
  W <- G*long_data$y                                    #Multiple weights by original count var
  Agg <- aggregate(W,by=list(x=long_data$x),FUN="sum")  #then sum those products
  mass <- colSums(model$gwt)                            #to get average divide by total mass of the weight
  for (i in 1:model$ng){
    Agg[,i+1] <- Agg[,i+1]/mass[i]
  long_weight <- reshape(Agg, varying=G_names, v.names="w_mean",
                         timevar = "Group", times = 1:model$ng, 
                         direction = "long")           #reshape to long
pred_means <- function(model){
    prob <- model$prob               #these are the model predicted means
    Xb <- model$X %*% model$beta     #see getAnywhere(plot.dmZIPt), near copy
    lambda <- exp(Xb)                #just returns data frame in long format
    p <- exp(-model$tau * t(Xb))
    p <- t(p)
    p <- p/(1 + p)
    mu <- (1 - p) * lambda
    t <- 1:nrow(mu)
    myDF <- data.frame(x=t,mu)
    long_pred <- reshape(myDF, varying=paste0("X",1:model$ng), v.names="pred_mean",
                         timevar = "Group", times = 1:model$ng, direction = "long")

#Note, if you estimate a ZIP model instead of the ZIP-tau model
#use this function instead of pred_means
pred_means_Nt <- function(model){
    prob <- model$prob               #these are the model predicted means
    Xb <- model$X %*% model$beta     #see getAnywhere(plot.dmZIP), near copy
    lambda <- exp(Xb)                #just returns data frame in long format
	Zg <- model$Z %*% model$gamma
    p <- exp(Zg)
    p <- p/(1 + p)
    mu <- (1 - p) * lambda
    t <- 1:nrow(mu)
    myDF <- data.frame(x=t,mu)
    long_pred <- reshape(myDF, varying=paste0("X",1:model$ng), v.names="pred_mean",
                         timevar = "Group", times = 1:model$ng, direction = "long")

occ <- function(long_data){
 subdata <- subset(long_data,x==1)
 agg <- aggregate(subdata$PMax,by=list(group=subdata$GMax),FUN="mean")
 names(agg)[2] <- "AvePP" #average posterior probabilites
 agg$Freq <-$GMax))[,2]
 n <- agg$AvePP/(1 - agg$AvePP)
 p <- agg$Freq/sum(agg$Freq)
 d <- p/(1-p)
 agg$OCC <- n/d #odds of correct classification
 agg$ClassProp <- p #observed classification proportion
 #predicted classification proportion
 agg$PredProp <- colSums(as.matrix(subdata[,grep("^[G][0-9]", names(subdata), value=TRUE)]))/sum(agg$Freq) 
 #Jeff Ward said I should be using PredProb instead of Class prop for OCC
 agg$occ_pp <- n/ (agg$PredProp/(1-agg$PredProp))

Now we can just use the data in the crimCV package to run through an example of a few different types of plots. First lets load in the TO1adj data, estimate the group based model, and make our base plot.

out1 <-crimCV(TO1adj,4,dpolyp=2,init=5)

Now most effort seems to be spent on using model selection criteria to pick the number of groups, what may be called relative model comparisons. Once you pick the number of groups though, you should still be concerned with how well the model replicates the data at hand, e.g. absolute model comparisons. The graphs that follow help assess this. First we will use our helper functions to make three new objects. The first function, long_traj, takes the original model object, out1, as well as the original matrix data used to estimate the model, TO1adj. The second function, weighted_means, takes the original model object and then the newly created long_data longD. The third function, pred_means, just takes the model output and generates a data frame in wide format for plotting (it is the same underlying code for plotting the model).

longD <- long_traj(model=out1,data=TO1adj)
x <- weighted_means(model=out1,long_data=longD)
pred <- pred_means(model=out1)

We can subsequently use the long data longD to plot the individual trajectories faceted by their assigned groups. I have an answer on cross validated that shows how effective this small multiple design idea can be to help disentangle complicated plots.

#plot of individual trajectories in small multiples by group
p <- ggplot(data=longD, aes(x=x,y=y,group=Ord)) + geom_line(alpha = 0.1) + facet_wrap(~GMax)

Plotting the individual trajectories can show how well they fit the predicted model, as well as if there are any outliers. You could get more fancy with jittering (helpful since there is so much overlap in the low counts) but just plotting with a high transparency helps quite abit. This second graph plots the predicted means along with the weighted means. What the weighted_means function does is use the posterior probabilities of groups, and then calculates the observed group averages per time point using the posterior probabilities as the weights.

#plot of predicted values + weighted means
p2 <- ggplot() + geom_line(data=pred, aes(x=x,y=pred_mean,col=as.factor(Group))) + 
                 geom_line(data=x, aes(x=x,y=w_mean,col=as.factor(Group))) + 
                geom_point(data=x, aes(x=x,y=w_mean,col=as.factor(Group)))

Here you can see that the estimated trajectories are not a very good fit to the data. Pretty much eash series has a peak before the predicted curve, and all of the series except for 2 don’t look like very good candidates for polynomial curves.

It ends up that often the weighted means are very nearly equivalent to the unweighted means (just aggregating means based on the classified group). In this example the predicted values are a colored line, the weighted means are a colored line with superimposed points, and the non-weighted means are just a black line.

#predictions, weighted means, and non-weighted means
nonw_means <- aggregate(longD$y,by=list(Group=longD$GMax,x=longD$x),FUN="mean")
names(nonw_means)[3] <- "y"

p3 <- p2 + geom_line(data=nonw_means, aes(x=x,y=y), col='black') + facet_wrap(~Group)

You can see the non-weighted means are almost exactly the same as the weighted ones. For group 3 you typically need to go to the hundredths to see a difference.

#check out how close
nonw_means[nonw_means$Group==3,'y'] -  x[x$Group==3,'w_mean']

You can subsequently superimpose the predicted group means over the individual trajectories as well.

#superimpose predicted over ind trajectories
pred$GMax <- pred$Group
p4 <- ggplot() + geom_line(data=pred, aes(x=x,y=pred_mean), col='red') + 
                 geom_line(data=longD, aes(x=x,y=y,group=Ord), alpha = 0.1) + facet_wrap(~GMax)

Two types of absolute fit measures I’ve seen advocated in the past are the average maximum posterior probability per group and the odds of correct classification. The occ function calculates these numbers given two vectors (one of the max probabilities and the other of the group classifications). We can get this info from our long data by just selecting a subset from one time period. Here the output at the console shows that we have quite large average posterior probabilities as well as high odds of correct classification. (Also updated to included the observed classified proportions and the predicted proportions based on the posterior probabilities. Again, these all show very good model fit.) Update: Jeff Ward sent me a note saying I should be using the predicted proportion in each group for the occ calculation, not the assigned proportion based on the max. post. prob. So I have updated to include the occ_pp column for this, but left the old occ column in as a paper trail of my mistake.

#  group     AvePP Freq        OCC  ClassProp   PredProp     occ_pp
#1     1 0.9880945   23 1281.00444 0.06084656 0.06298397 1234.71607
#2     2 0.9522450   35  195.41430 0.09259259 0.09005342  201.48650
#3     3 0.9567524   94   66.83877 0.24867725 0.24936266   66.59424
#4     4 0.9844708  226   42.63727 0.59788360 0.59759995   42.68760

A plot to accompany this though is a jittered dot plot showing the maximum posterior probability per group. You can here that groups 3 and 4 are more fuzzy, whereas 1 and 2 mostly have very high probabilities of group assignment.

#plot of maximum posterior probabilities
subD <- longD[x==1,]
p5 <- ggplot(data=subD, aes(x=as.factor(GMax),y=PMax)) + geom_point(position = "jitter", alpha = 0.2)

Remember that these latent class models are fuzzy classifiers. That is each point has a probability of belonging to each group. A scatterplot matrix of the individual probabilities will show how well the groups are separated. Perfect separation into groups will result in points hugging along the border of the graph, and points in the middle suggest ambiguity in the class assignment. You can see here that each group closer in number has more probability swapping between them.

#scatterplot matrix
sm <- ggpairs(data=subD, columns=4:7)

And the last time series plot I have used previously is a stacked area chart.

#stacked area chart
nonw_sum <- aggregate(longD$y,by=list(Group=longD$GMax,x=longD$x),FUN="sum")
names(nonw_sum)[3] <- "y"
p6 <- ggplot(data=nonw_sum, aes(x=x,y=y,fill=as.factor(Group))) + geom_area(position='stack')

I will have to put another post in the queue to talk about the spatial point pattern tests I used in that trajectory paper for the future as well.

Leave a comment


  1. Paul Chang

     /  January 13, 2018

    #plot of maximum posterior probabilities
    p5 <- ggplot(data=subD, aes(x=as.factor(GMax),y=PMax)) + geom_point(position = "jitter", alpha = 0.2)

    #scatterplot matrix
    sm <- ggpairs(data=subD, columns=4:7)

    What's the "subD" meaning in the 2 sets codes? I cannot execute this code in these 2 sets codes.

    • Thank you for the note. “subD” is a new dataset, and I ommitted the line of code to create it. It is is simply

      subD <- longD[x==1,]

      I will update the post to note this. Thank you for pointing out that error.

      • Paul Chang

         /  January 17, 2018

        Thanks for your kind reply.

        Actually, I am interested in this method.
        How does the data (” TO1adj ” ) look like?
        Could I import CVS file into this package to do group-based trajectory modeling?

        Thanks a lot.

    • The TO1adj data is just the example data that comes with the crimCV package. You can import whatever data you want from a csv file and estimate the trajectories, it just needs to be in wide format, e.g.

      ID, Time1Val, Time2Val, Time3Val, ……

  2. mario.s

     /  April 23, 2018

    Dear Prof. Wheeler, is it possible to estimate a basic count model with no inflation or overdispersion using crimCV? Additionally, is there a way to set time at specific values e.g. 0 0.1 0.2 etc.?

    Thank you in advance

    • In crimCV the model is a zero-inflated Poisson. So it does not have over-dispersion (e.g. negative binomial regression), but simply has the zero inflated parameter.

      There is not a way to specify to not estimate a zero-inflated parameter in this program, nor am I aware of an R program that could be a substitute unfortunately. Ditto for specifying non-uniform time periods.

  3. Hi Andrew,

    I am trying to run this simple code to model temperature ( values between 80 and 110 F)

    (This is example but I load similar kind of data from a excel)

    df1 = matrix(runif(120, 80,110), nrow = 20, ncol=6)
    model <- crimCV(df1,3,dpolyp=3)

    However when I run this code I keep getting the following error

    "Error in dmZIPt(Dat, X, ng, rcv, init, Risk) : object 'Frtr' not found"

    I am relatively new to R debugging this I figured that it is running some Fortran code

    Frtr <- .Fortran("r_dmzipt_init_param", as.double(X),
    as.double(Datm), as.double(Risk), pparam = as.double(pparam),
    pllike = as.double(pllike), as.integer(ni), as.integer(no),
    as.integer(npp), as.integer(ng), as.integer(npop))

    I am not sure is it related to the data that I have that is giving me the error? When I run the TO1Traj data it works fine.
    I wanted to check if you have encountered similar error or would you know if I am doing something incorrectly?


    • I have not come across that (and I can replicate that same error on my machine), R version 3.6.2 on windows at least.

      I would email the author of the package to see if they know what the issue is (and maybe try a newer install of R).

      • Thanks for the quick reply! I will email the author! and will also try a newer installation of R.

    • ArthurE

       /  July 23, 2020

      Hi Radumeg,

      Just wondering, but did you ever figure out a way around this error? I am experiencing the same one. The error goes away if I don’t include a Risk matrix, but unfortunately I need to include the time-at-risk correction in my analyses.

      • I reached out to the authors of the crimCV package and here is the reply
        Subject: RE: Reg : error in crimCV package in R

        crimCV is software that fits a model that assumes that the input is count
        data (i.e. integers) with a medium to large proportion of zeros. You are
        trying to fit this model to data that clearly violates the assumptions on
        which the model was constructed. This makes this model totally
        inappropriate for the data you are trying to analyze.

        > The numerical optimization routine is crashing which results in the error.
        > This is indeed happening in the Fortran routines.
        > You are calling the code correctly but I don’t consider this crash unexpected as you are trying to fit ZIP trajectory models to uniformly distributed values between 80 and 110. ZIP models are for count data with a high proportion of zeroes. Even if you were lucky and the routine didn’t crash it wouldn’t be wise to use the results. Conceptually equivalent to using a classic linear regression model with a Gaussian residual error assumption to analyze binary data.

        I hope this help!
        Having said that, I have steered away from this to k-shape trajectory clustering ( as that served my purposes well and I had continuous data and with very few zeros.

  4. Celina

     /  June 29, 2020

    Hello professor Andy,

    I was wondering exactly what does the muhat(time) axis means?

    • I’m not sure what code you are referring to — can you be more specific?

      • Celina

         /  June 29, 2020

        I am trying to figure out how to read the first graph that is created when I run the line plot(out1). I know the x-axis is the time, but what about the y-axis?

      • Ok I see now — those are just the predicted values. So if you are predicting crime counts over time for an example, in that graph at time period 15, the red line has close to a predicted 4 crimes per year.

        Always feel free to send me an email Celina —

  5. Hi Andrew,

    I was wondering how I can get the membership for new data using the predicted model.



    • Sorry I do not know offhand! I imagine there is a procedure to get the highest prob group if you have multiple prior temporal observations.

  6. Desmond

     /  May 10, 2021

    Hi Andrew, Thank you a lot. I was wondering about dpolyp and dpolyl arguments… Why/when change de defoult value of 3?

    • Those have to do with the underlying form of the equation. Sometimes you know a cubic functional form is unlikely. So if you have 5 time periods, while you can technically fit a cubic it will be difficult for real world data to be that curvy to justify a cubic over fewer terms.

      With longer time series, it is easier to fit higher level polynomials. But that does not mean that a linear function is sufficient. In my work on crime trends and micro places,, it was 13 years and the trajectories were clearly linear.

      You can plot the original data as I have shown in this post to give a visual check. But you can also do all of the same fit statistics comparing different polynomial functions as you can for comparing more/less groups.

  1. The spatial clustering of hits-vs-misses in shootings using Ripley’s K | Andrew Wheeler
  2. Group based trajectory models in Stata – some graphs and fit statistics | Andrew Wheeler
  3. Paper – Replicating Group Based Trajectory Models of Crime at Micro-Places in Albany, NY published | Andrew Wheeler
  4. Simulating Group Based Trajectories (in R) | Andrew Wheeler

Leave a Reply to apwheele Cancel reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: