Posted on  March 1st 2010 ... 

* Our software will be subject to change at any time *

* New Software * Fitting truncated DP mixtures and hierarchical DP mixtures of normals is on github! Check out!

* New Software * Chect out gpustats for doing statistical computation in python on massive datasets is maintained at github.

We've added a matlab function to evaluate the multivariate normal pdf at many data points and many parameter locations.

By request, we've added some supporting matlab code to relabel MCMC samples from CDP using our new relabeling paper.

Mixture modeling in massive datasets is a very powerful method, and our GPU based tools make it possible from a single computer. For exact model specifications and code from the paper, please refer to Appendix A of Understanding GPU programming for statistical computation: Studies in massively parallel massive mixtures and it's supplements.

Our code includes a most effective strategy to deal with the "component relabelling" issue in mixtures. Details are in (relabeling paper to be posted in late August 2010).


The current source code is available for download here. To compile the source type either "make cdps", "make cuda", or "make cudamgpu" (This makes the multi gpu version) in the "codebase" folder. If your compiler can't find the required libraries, adjust "makefile", "makefile.CUDA_SingleGPU",and "makefile.CUDA" files accordingly.


To run the algorithm, call ./cdps parameters.txt. See this example for running the code directly from the terminal. To avoid the dirty work, we built a matlab wrapper and an R wrapper that creates the input files and retrieves the data automatically. Note that this wrapper requires Matlab 2009a or later. The GPU executables and source require the boost libraries.