When was the last time you booked accommodation without checking its photos? Most probably never! Because having imagery information makes our decision-making process much easier and faster. However, picking up the best possible images of a hotel to show to the user is an interesting problem to solve, because it can be a naive random selection or a sophisticated machine learning model to know what the user truly wants at that moment.
This is why we at trivago have a dedicated Extract-Transform-Load (ETL) pipeline to create the image galleries for accommodations. This pipeline will generate a sorted image gallery for a given accommodation considering several parameters such as content on the image (represented by the image tag) and the quality of the image.
We initially had this pipeline on Amazon Web Services (AWS) and we recently migrated it to run on Google Cloud Platform (GCP). We did a complete redesign of the architecture when we migrated, not just because they are two different cloud platforms, but also because we wanted to add more features and stability to the pipeline.
However, no software project is a smooth ride, and we’re going to talk about the problems we faced and how we tried to fix most of them.