Big plans for big data at UST

by Peter Beacom

UST CoE4BD There is a shortage of talent in the labor pool for dealing with very large sets of data and The University of Saint Thomas Graduate Programs in Software (GPS) is gearing up to do something about that.

The newly formed Center of Excellence for Big Data (CoE4BD) will serve as a teaching and research group that will offer education and support to groups in academia, government, and business in need of help processing big data.

As a field, big data doesn’t exactly have a concrete definition. One of the organizers of the CoE4BD, Dr. Brad Rubin, considers a problem that contains a certain amount of “volume, velocity, and variability” in its data set to fall into the big data domain. Large sets of marketing, business operations, and scientific data sets are the usual suspects. Dr. Rubin is an information retrieval specialist in the GPS program, but the idea for the center sprang from collaboration with a neuroscientist at the University of Saint Thomas.

The neuroscience researcher had terabytes of data collected from the hippocampus in the brain of a rat that was trying to find its way through a maze (of course). The GPS department is in possession of a 43 compute core Hadoop cluster with 40TB of storage. Dr. Rubin offered to provision the cluster to perform what was essentially digital signal processing of the neuron data; once setup, the cluster crunched numbers and provided results to the neuroscientist in relatively short order using 15 of the compute cores.

Similar comparison to attempting the analysis on a workstation is extremely difficult — as the researcher had to down sample the data just to fit it in his workstation’s memory whereas the Hadoop cluster could absorb all of the data available.

Working with big data requires the ability to conceptualize abstract data sets of what is generally unstructured data. In addition, the mathematical, statistical, and software skills must be present to manipulate the data. Even with those in place the success of a big data project ultimately rests on asking the right questions about the data and coherently communicating findings.

This all makes big data problems some of the most interesting multidisciplinary problems in the market place today. The market for solutions to big data projects has been validated by the recent IPO of Splunk, a big data software & service provider with a market capilization of over $3 billion.

The CoE4BD was soft lunched last month by six University of Saint Thomas faculty members with a vision to become a broadly recognized local resource, in addition to the modern curriculum offerings that begin this fall. The center aims to identify big data challenges facing industry and work on them as class projects & independent research projects for grad students.

“It’s easy in academia to make up problems & then solve them. It’s much more difficult and a much richer experience to find real world problems to tackle. Because that really forces you to be creative and break bounds”, says Dr. Rubin of his desire for industry engagement with the center.