Data Engineering

We build scalable platforms for the collection, management, and analysis of data.

Systems and Tools

Probabilistic Record Linkage

Our team is developing scalable Apache Spark based systems to disambiguate hundreds of millions of records referencing identical entities from disparate internal and external data sources. Through the application of large scale partitioning and machine learning algorithms trained to identify relations between records from different systems, our efforts aim to unify many different datasets into a single queryable analytics resource that can return data describing individuals within MassMutual's information systems and beyond.

Data Ingestion Pipelines

Our analytics platform ingests data across sources ranging from relational databases and mainframe extracts to log files, images, and tweets. We build and deploy systems based on Apache Spark and other tools from the Hadoop ecosystem. We use Jenkins CI for job scheduling and continuous integration/delivery, git for source control, Ansible for configuration management, virtualenv for python environments, and Docker for deployment and testing.

Data Engineering Development Program

Our Data Engineering Development Program launches in Boston in Summer 2018. If you have a background in data engineering are are looking to round out your professional experience with hands-on training, high-impact project work, and the potential for long-term career opportunities with MassMutual, we want to hear from you! This two-year Data Engineering Development Program will offer full time employment with the option for tuition sponsorship for supplemental coursework. If you are interested in learning more, email Please include your resume and information about your background, academic pursuits, and career interests.

Data Engineers

Nicholas Chammas

Nabil Hachem

Owen Galvin

Ryan Griffin

Thom Neale

Randall Schwager