Recently I was at a talk on the subject of personalization (yes, there are a lot of them at the moment) and made myself somewhat unpopular there. One slide said: “The more data, the better”. I then stepped forward and mentioned that there are limits in the use case of personalization. If you have too little data, you can hardly say anything about the customer. If you have too many, you first have to recognize which data is relevant. The data only gains relevance if you know in advance which data is needed for which purpose and thus sharpen your focus and cannot be distracted by further information.
I will try to make it more tangible with the help of an example: If a customer only buys a white T-shirt, we can’t see much from this information. Even if we still consult their history in the shop.
If a customer has a purchase history with many products (> 50 e.g.) this does not automatically tell us much more. We don’t necessarily know which of the purchased products are really relevant and representative, but due to the large amount of data we have purchased a lot of noise.
So what should be done?
I recommend that you do not first think about how much or where you get the data from, but that you invest time to think about what you want to get from the data and what data you need. Then you can separate the relevant data from the not quite so relevant data and reduce the space you have to deal with and can focus on the most important things.
Of course you can argue that you prefer to collect and store everything. Then you have all the data when you need it and you can run several experiments in parallel. That’s not wrong, on the contrary, it’s true, but the statement that more data is better means that you use more data and not that you have more data available.
Big Data was followed by the next buzzword, Smart Data. Next comes something like Intelligent Data, or is that even yesterday’s?