Building a real time data pipeline with Serverless and Kinesis

Big Data, Terraform, Serverless, Kinesis, Snowplow, Lamdba, Real time. Lots of buzzy terms, and adding them in to any one project can be a daunting task. So when we decided that we wanted to re-architect our data processing and events systems to have a single pipeline, we decided to try and use some of the newer tools available to really understand if they would help us deliver faster, or hinder us. We built a brand new, real time data pipeline that deals with all the user events and tracking information on our platform, in real time with minimal latency, to enable our business teams to understand user behaviour and for our data science models to help predict in real time. In this talk, I’ll cover why we wanted to take on such a challenge, how we achieved it, and what it’s like to run 16k events per minute through a (mostly) serverless architecture.

