Morpho: massive improvements in performance of semi-landmark sliding

31 Oct 2014

On my visit of the MPI in Leipzig (which was great, thanks a lot!!), Philip Gunz told me about a method (proposed by Demetris Halazonetis) that could improve computation time and memory footprint for the relaxation of semi-landmarks, which are serious restrictions on the maximum amount of coordinates usable with this method. So I spent the best part of Wednesday night and the 6-hour train ride home to tweak the underlying subroutines, making use of the excellent R-package Matrix and its large variety of classes for all kinds of sparse/triangle/symmetric matrices. At the end I was able to run the examples in my package on my netbook as fast as the untweaked version on my desktop workstation.

So when I arrived at the office this morning, I was eager to compare benchmarks between those version on my machine.

Hardware specs:

CPU: Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz (Hexacore)
RAM: 32GB DDR3 @1600 MHz

Software:

Ubuntu 14.04.1
Vanilla R 3.1.1
BLAS: ATLAS tuned to my machine (it is definitely worth it!!) to allow parallel algebra operations.

### first create a benchmarking function

benchslide <- function(nofl=c(50,100,200,500,1000,2000,3000,4000)) {
    require(Morpho)
    require(Rvcg)
    data(nose)
    tmp <- NULL
    for (i in 1 : length(nofl)) {
        cat(paste0("running function with ",nofl[i]," semi-landmarks\n"))
        semi <- vcgSample(shortnose.mesh,SampleNum = nofl[i])
        semilong <- tps3d(semi,shortnose.lm,longnose.lm)
        tmp[i] <- system.time(relax <- relaxLM(semi,semilong, mesh=shortnose.mesh, iterations=1,SMvector=1:nrow(semi), deselect=F,surp=1:nrow(semi)))[3]
        cat(paste0("finished with ",nofl[i]," in ", tmp[i], " seconds\n")) 
    }
    return(list(timing=tmp,nofl=nofl))
}


### now install pretweaked version from github:
require(devtools)

install_github("zarquon42b/Morpho", ref="c8f3652bea901f40d93ce5efc645d2fb75b8a6da")

##run the benchmark function

old <- benchmark()
save(old,file="old")##save the results to disk


### now close the old workspace and open a new one

### get the tweaked version:
install_github("zarquon42b/Morpho", ref="ccb67d3f96d2810176490f32b0cdd0df66d51155")
new <- benchmark()
save(new,file="new")##save the results to disk

## now load timings from the old version and visualize them:
load("old")
plot(old$nofl,old$timing,col=2,cex=1.5,pch=19, xlab="no. of semi-landmarks", ylab="elapsed time in seconds")
lines(spline(old$nofl,old$timing),lwd=4,col=2)##create nice splines

load("new")#not necessary if you are still in your "new" workspace
points(new$nofl,new$timing,col=3,cex=1.5,pch=19)
lines(spline(new$nofl,new$timing),lwd=4,col=3)##create nice splines

Here are the graphs:

weak sliding routine — **Fig. 1**.: Green: improved version; Red: old version.

Hereby, the timing for 4000 semi-landmarks was 507.6 secs for the old version, compared to 33.5 secs for the tweaked version. Quite impressive, isn’t it?

FUN FACTS: using ATLAS and testing for 10,000 semi-landmarks with the tweaked function, I managed to max out all 32GB RAM (yay, first time on this machine). Using OpenBLAS, it only took 160 seconds (awesome!) and required only 20 GB RAM (don’t try this on your fancy tablets).

EXTRA FUN:

Now as a special gimmic, we calculate 2nd order polynomial approximations with number of landmarks as predictor in order to predict the timings for 10,000 semi-landmarks.

## first we check whether extrapolation of 2nd order Polynomial is sensible by calculating the model with the values for 4000 semi-landmarks removed and compare the resulting prediction with the actual value:

oldmod1 <- lm(old$timing[-8] ~ old$nofl[-8]+I(new$nofl[-8]^2))
newmod1 <- lm(new$timing[-8] ~ new$nofl[-8]+I(new$nofl[-8]^2))

pred4000old <- sum(oldmod1$coefficients*c(1,4000,4000^2))#426.4685 a bit underestimated compared to the actual value of 507.6
pred4000new <- sum(newmod1$coefficients*c(1,4000,4000^2))# 29.31068 reasonably close to the actual value of 33.5

### as the models tend to aim too low, the extrapolation will give us a lower boundary

### compute the models from the complete data

oldmod <- lm(old$timing ~ old$nofl+I(new$nofl^2))
newmod <- lm(new$timing ~ new$nofl+I(new$nofl^2))

## extrapolate from data

pred10000old <- sum(oldmod$coefficients*c(1,10000,10000^2))
pred10000new <- sum(newmod$coefficients*c(1,10000,10000^2))

## create plot
oldnoflpred <- c(old$nofl,10000)
oldtimingpred <- c(old$timing,pred10000old)

newnoflpred <- c(new$nofl,10000)
newtimingpred <- c(new$timing,pred10000new)
plot(oldnoflpred,oldtimingpred,col=2,cex=0,pch=19, xlab="no. of semi-landmarks", ylab="elapsed time in seconds")
points(old$nofl,old$timing,col=2,cex=1.5,pch=19)
points(10000,pred10000old,col=2,cex=1.5,lwd=3)
lines(spline(old$nofl,old$timing),lwd=4,col=2)##create nice splines
lines(spline(oldnoflpred,oldtimingpred),lty=2,lwd=4,col=2)##create nice splines

points(new$nofl,new$timing,col=3,cex=1.5,pch=19)
points(10000,pred10000new,col=3,cex=1.5,lwd=3)
lines(spline(new$nofl,new$timing),lwd=4,col=3)##create nice splines
lines(spline(newnoflpred,newtimingpred),lty=2,lwd=4,col=3)##create nice splines

And here are the extrapolated timings for 10,000 landmarks: 213.7 secs with the tweaked function and 3975.83 secs for the old function (more than 1 hour).

My R-blog

Morpho: massive improvements in performance of semi-landmark sliding

EXTRA FUN:

Related Posts

Statismo packages now available for Ubuntu 24.04 LTS (Noble Numbat) 01 Oct 2024

Statismo packages now available for Ubuntu 22.04 LTS (Jammy Jellyfish) 06 Apr 2022

New Maintenance Release of Rvcg (0.21) on CRAN 24 Mar 2022