Simple prediction
A simple way
Is it possible to get a prediction that is close to the one computed by Daniele’s algorithm (or by the consequent formula) and to do it in a simple way? Here is an attempt.
I computed the linear regression between predicted score and the following value (let’s call it “scoring ratio”): [current_score / (age – 18)].
The result is that
predicted_score = (9.9951 * scoring_ratio) + 11.857
In other words, and with some approximation:
[current_score * 10 / (age – 18)] + 12
Let’s call it “simple prediction”.
The R-squared is 0.8084, so it is not a bad fit.
Here is the graph (full resolution):
Players that the simple prediction underestimates most:
Player | Real prediction | Simple prediction | Difference |
---|---|---|---|
Valbuena | 78.52 | 53.36 | 25.16 |
Griezmann | 67.78 | 46.59 | 21.09 |
Matuidi | 70.16 | 49.86 | 20.30 |
Players that the simple prediction overestimates most:
Player | Real prediction | Simple prediction | Difference |
---|---|---|---|
Neymar | 157.35 | 207.33 | -49.98 |
Lucas Moura | 48.55 | 87.19 | -38.64 |
Goetze | 106.12 | 127.54 | -21.42 |
If you check the (real_prediction / simple_prediction) quotient, all the extreme values are uncapped players. As a matter of fact, for an an uncapped young player like Gabriel the real prediction is more than twice the simple prediction (24.63 vs. 11.86), while an old uncapped player like Arteta is given 11.86 points by the simple prediction but 0.00 by the real one (so the quotient is 0.00).
A simpler way?
But can it get even simpler? Well, one can consider a simpler scoring_ratio:
current_score / age
(i.e. no more subtracting 18 from the age).
The R-squared doesn’t drop much: 0.7916.
Here is the graph (full resolution):
Let’s review the same data as above… Formula:
predicted_score = (33.597 * scoring_ratio) + 11.757
Players that the simpler prediction underestimates most:
Player | Real prediction | Simpler prediction | Difference |
---|---|---|---|
Sterling | 63.50 | 36.55 | 26.95 |
Griezmann | 67.78 | 41.10 | 26.58 |
Depay | 65.19 | 38.77 | 26.42 |
Players that the simpler prediction overestimates most:
Player | Real prediction | Simpler prediction | Difference |
---|---|---|---|
Elia | 27.00 | 47.57 | -20.57 |
Montolivo | 59.00 | 78.47 | -19.47 |
Llorente | 28.00 | 46.35 | -18.35 |
Indeed this ranking is a list of the players who played had more than a few caps (hence a high scoring ratio, i.e. a high simpler prediction) but have fallen out of grace (hence a low real prediction, since it uses points in the last 12 months): after the three cited players, there are Young, Navas, Alexandre Pato, Diaby, Gomis, Lucas Leiva, Criscito, Negredo, Rami…
How did the simple (not simpler!) prediction rate these players?
Player | Real prediction | Simpler prediction | Simple prediction |
---|---|---|---|
Elia | 27.00 | 47.57 | 41.41 |
Montolivo | 59.00 | 78.47 | 60.95 |
Llorente | 28.00 | 46.35 | 37.44 |
And again, the highest and lowest (real/prediction / simpler_prediction) quotients are dominated by uncapped players.
Conclusion
Well, I had fun running this experiment, but it is really useless.