In the last picalike workshop the data analysts and scientists Bendix Sältz and his colleague Dr. Christoph Ölschläger took us into the world of data, analysis and statistics. The topic “Predictive Analytics with Python” was not only discussed in theory, but we also started directly via Jupyter Lab with data analysis in Python.
These 4 steps should be followed for a good “prediction”:
- First of all, one must think of a specific question: What question do I want to answer with my analysis? When I am clear about my actual goal, I need to collect the right data. Everyone talks about big data, but I often don’t need the bulk of the data for my question. Therefore I have to “collect” or request exactly the data I actually need for my analysis.
- The second step is to clean up the data. In the workshop we worked on a CSV. We read the data and eliminated all information that did not seem important for our question. Afterwards we processed the data in a way that we had actual values for each field of the CSV that we could work with. So for example we “decoded” text fields.
- Depending on the question, we then decided on a model and framework to answer it. In the workshop we talked about the random forest and about different regression models, such as the linear regression model.
- In the fourth step, the model must then be interpreted to derive a prediction. This prediction should then be prepared visually and entered in a presentation that is easy to understand (not only for statisticians).