2.5 quintillion bytes of data are being generated daily so it’s safe to say, data is at the heart of many business's success! Data in its raw state is messy and usually difficult to use. My role as a Data Engineer at Kubrick Group is to handle this huge volume of data. Kubrick train graduates in data skills and deploy them at corporations. I have been deployed at Sainbury's Digital, Tech and Data. I create data pipelines that manipulate, move and store data so it can be used by an analyst or stakeholder to gain business insights.
Practically, this looks like using software and writing code (such as SQL) to ‘query’ a database. A query is just a way of requesting particular information from a database such as knowing ‘the number of 5th years who bought a Fab ticket in the month of May 2022’. Currently, I am querying our (Sainsburys) central database to extract the dataset we need. Basic analytics are then applied to the data such as comparing the effect of product in-store location on sales volume. This all occurs in an automated pipeline and the results are communicated to store managers via an app.
What’s the best thing about what you’re doing now?
Seeing the result of a huge, automated data pipeline is quite satisfying as it saves people so much time. There are many interacting parts in this ‘digital system’ and its amazing what code can do. When things break down, it can be interesting playing detective and finding the root of the problem and fixing it.