International Center for Naturalistic Driving Data Analysis at Virginia Tech

VTTI’s data center infrastructure is a large, complex, and continually evolving environment supporting the institute’s mission by providing the foundation for VTTI’s data-intensive scientific research programs, including peta-scale studies like the National Academies of Science – Transportation Research Board’s “Second Strategic Highway Research Program” (SHRP2) Naturalistic Driving Study.

VTTI’s Naturalistic Driving Studies collect immense volumes of data, with current storage requirements for archive and on-line data approaching 10 PB (petabytes). Data collected at remote locations is often staged on an off-site server and transferred to VTTI. After arriving at VTTI, the data are processed by an enterprise-class workflow system running on a 48-node High Performance Computing (HPC) cluster. The workflow system archives the raw files to a Hierarchical Storage System and unpacks and processes the individual files for ingestion. Composite video files are processed into their discrete views, re-encoded for analysis, and loaded into a 2.4-PB scale-out Network Attached Storage (NAS) clustered file system. Sensor-derived data is extracted, transformed, and loaded into a 1.2-PB Massively Parallel Processing (MPP) enterprise database platform. Once data is in VTTI’s facilities, processing and analyses are carried out using a 10-gig HPC network to quickly manipulate the data.

VTTI’s primary objective is to conduct scientific research and analyses on the collected data. In support of that effort, VTTI offers its researchers tools including MATLAB, R, SAS, and other statistical packages that run on a variety of computing clusters including the aforementioned 48-node HPC cluster, a dedicated MATLAB cluster, and various analysis servers.

VTTI’s Information Technology (IT) team works closely with Virginia Tech’s central IT organization and other large-scale research institutes at Virginia Tech to leverage strategic university investments in research computing and archival resources.