Data and Core Values at Ethos

4 min readFeb 1, 2022

Michael Mata, Engineering Manager

Intellectual Rigor is a core value at Ethos. Making data informed decisions is one way we embody this value everyday. As a fast moving startup, we often have to make decisions in the face of uncertainty. We deal with incomplete, unavailable, or conflicting data, make sense of multi data sources, and even predict our best estimate of future risk. This makes data, and our ability to extract meaning, crucial to everything we do at Ethos. It’s a foundational pillar of our virtuous cycle.

The data team’s charter is to reliably gather the rich data we get from our products and distribution, intelligently make sense of it, and to make it available for consumers of this data empowering them to make better decisions, insights, and predictions.

Our data platform serves and connects Ethos teams with our Carriers, Reinsurers, and Agencies & Partners. We have an important responsibility to be the source of truth for these ecosystems. Internally, our platform serves the whole of Ethos, including our front-facing experiences, growth and engagement teams, and our platform initiatives in underwriting, machine learning, and policy administration.

Our data engineering system can be broken into several parts. For our first installment, we’re going to give you a whirlwind tour of our stack.

Platform
Ingestion
Extraction & Transformation
Consumption

Platform

Our platform is composed of several managed and open sourced systems. We adopt the best of breed solutions for our problems. We use off the shelf software whenever possible, but won’t hesitate to build our own solution where we can add differentiated value.

Our platform team is responsible for managing these systems, making sure they’re healthy and available. Operational excellence is the name of the game, and these are the tools we employ:

Terraform to configure all systems, whether they are managed by a provider or by our infrastructure team. This provides a consistent source of truth when it comes to the setup and configuration of our data stack.
Github actions and Atlantis to push changes to our development, staging and production environments.
Datadog to monitor, notify, and triage the data flowing through systems and the operational health of the systems themselves.
Slack and PagerDuty for alerting.

Ingestion

For ingestion, we adopt a mix of push and pull, depending on what makes sense.

We typically pull data from external API(s) like payment providers or from object stores in the cloud. We use Airflow to orchestrate jobs that determine when and how data is pulled to Ethos. We store all raw data in S3, making use of it’s out of the box access control and encryption at rest features.
Data gets pushed to us from either Segment (an event queue) or Fivetran (a data warehouse sync).

Regardless of whether data is pulled or pushed, it will end up in our data warehouse. We use Snowflake because it makes data governance possible, provides a simple interface, and separates management of storage and compute resources.

Data governance is critical for us and is challenging to do right. Having the low level ability to mask columns and hide rows by role gives us the flexibility to tailor our solution as our needs change.

Extraction & Transformation

After entering our system, raw data needs to go through a couple more steps before it can be readily consumed.

Extraction is the initial step needed to make raw data usable. This step runs the gamut from parsing XML documents to deduping data that we’ve already seen before. The use cases in this area tend to be the most diverse. Our team leans heavily on Airflow, Python, and Pandas to customize our solutions.
Transformation is the next step needed to marry disparate data sets and transform them into something more consumable. We use DBT for transformations and we are huge fans. It pushes transformations to the data warehouse and lowers the level of effort needed to bring engineering discipline to the SQL space. You’ll find everything from low level data cleanup to high level aggregations at this stage.

Consumption

Much like ingestion, we adopt a mix of push and pull for consumption. Our data governance efforts are honored regardless of the approach.

We push data using Airflow to destinations like SFTP, API(s), object stores, or even back to Segment.
Our data can be pulled through data marts in our data warehouse, objects in S3, or dashboards in Mode.

Hope you enjoyed the tour

It was fast-paced and high-level, but you should now have a good sense of the technologies we use. It’s been quite a ride to get to this point. We expect the next few years to be just as exciting. If you’re interested in helping us on the next leg of our journey, we’d love to hear from you!

Michael joined Ethos in March 2019 and leads the Data Engineering team. When Michael isn’t building data systems, he enjoys reading sci-fi/fantasy, designing and playing games, and spending time with family. Interested in joining Michael’s team? Learn more about our career opportunities here.