Fantasy Scout Blog

Month: May, 2016

King Gonzalo and the Round Ball

If you break the all-time record for most goals scored in a Serie A season, you make history.

If you do it with this goal, you are legend.

France squad, Euro 2016

Fantasy Scout players:

  • Lloris (Michael H. 2007-10)
  • Mandanda (Benny 2007-10)
  • Varane (Andrea V. 2010-12)
  • Mangala (Gianmarco 2010-12)
  • Digne (Michael H. 2012-14)
  • Koscielny (Riccardo 2010-12)
  • Cabaye (Mark 2007-10)
  • Payet (Michael H. 2010-12)
  • Kante (Daniele 2014-16)
  • Matuidi (Generoso 2007-10)
  • Sissoko (Cristian 2007-10)
  • Pogba (Daniele 2012-14)
  • Griezmann (Sid Debgupta 2010-12)
  • Giroud (Michael H. 2010-12)
  • Gignac (Benny 2007-10)
  • Martial (Nigel 2012-14)
  • Coman (Nigel 2012-14)

Player that can still be picked (from Wikipedia):

  • Benoit Costil – GK – 3 July 1987 (age 28) – 0 caps – Rennes

Anatomy of a good pick – Where are we?

I don’t know when (or even whether!) I’ll be able to write the next post in the “Anatomy of a good pick” series, so it may be the right time to take stock.

What have I done so far?

  1. First, I asked the question: can we tell a good pick from a bad one as soon as they are made?
  2. Then I realized that this question can be answered by using some machine learning methods, namely decision trees and random forests.
  3. Finally, I built six decision trees: 1+2, 3+4, 5+6.

What questions do I want to answer next? (and how?)

  1. Is it possible to “visualize” in some way the random forest that tells you if a pick is good or bad? (probabilities prediction for “dummy” players; feature importances; plotting pairs of features)
  2. Is at least one of the six decision trees similar enough to the random forest that a scout can use the tree instead of the forest? (applying the methods cited at the previous bullet point to the six decision trees)
  3. How good are the trees and the forest at predicting if a pick is good? (validation/test)
  4. How can we get a better tree and and a better forest? (hypotheses about new data to obtain)
  5. Do other classification algorithms do a better work than the decision tree and the random forest?
  6. Assuming that “has >50% chances to score less than 8 points” is the same as “is a bad pick” is naive. What we really want to know is: what is the weighted average of a given player’s possible final scores?
  7. So far we have considered the following scenario: you have already decided to pick a player, and you consult an algorithm that tells you whether it is a good idea. But can we find a way to have the computer tell you “pick that player”?

Dyslexia

“FantasySCOTUS is the leading Supreme Court Fantasy League.”

(I hoped it was about the Subtle Doctor…)

The tree of the knowledge of good and evil

As a break between trees and forests, I ran a totally silly experiment: let’s ask scikit-learn to build a tree that tells my 76 players from Daniele’s 65 players.

Here is the tree (max depth 3, min samples per leaf 5):

In short, I pick players who are valued more and are younger… i.e. who are better! (so why I perform worse than Daniele?!)

Also (I knew this!) I pick more Dutch players, and more defensive players.

Jokes aside, this tree made me doubt my belief that Daniele uses some kind of formula to choose the players to pick…

Anatomy of a good pick – Trees #5 and #6

(see the previous posts in this series)

Tree #5

As planned, I asked scikit-learn (AKA sklearn) to add a fifth level. Here is the resulting tree:



(full resolution)

As expected, there aren’t great insight. Let me highlight some curious things, though (references to the numbers I used to annotate the tree):

  1. High market value, not young, non-defensive position: it can be good if he has already two caps under his belt.
  2. High market value, not young, defensive position: if he is Italian, it’s a sure bet!
  3. Counterintuitive: among players who are >5M, <24yo, non-forward, the ones valued more (>10M) are worse than the cheaper ones.
  4. Five times out of five, players valued 100-800Th and aged less than 17.5 are a good pick! 🙂

It’s official: four levels are enough.

Tree #6

Minimum 30 samples per leaf (previously it was 5); no depth limits.



(full resolution)

Not so different from tree #4. It’s just that nations can’t be used anymore (too few players per nation), so they are replaced by age.

The best chance to get a good pick is: value >5M, age 22 or 23 (younger players sometimes lose their path to success?).

Enough with trees, time to move to forests.

Anatomy of a good picks – Trees #3 and #4

(I guess the only way to follow it is to be accostumed with the previous posts in this series.)

Tree #3

New tree, with the following changes to the features:

  • Translated position from categorical to numerical, from 0 goalkeeper to 30 forward (see below): I guess this will make position more signficant in the tree.
  • Removed “Goals when picked”: I guess it’s better to remove an almost useless feature.

The tree is… the same as in the previous post. So changing the type of the “Position” feature had no effect.

Tree #4

Changed max depth to 4.



(full resolution)

The top three levels are the same as in the previous tree, so we’ll focus on the fourth one. But before doing that, I have to tell you how position is coded. I used the same values Daniele uses for the prediction formula:

  • 0 = Keeper
  • 5 = Right/Left-Back
  • 10 = Centre-Back
  • 15 = Defensive Midfield
  • 20 = Right/Central/Left Midfield
  • 25 = Attacking Midfield
  • 27 = Right/Left Wing
  • 30 = Secondary Striker or Centre Forward

So let’s read the fourth level, starting from the right (i.e., generally speaking, going from better to worse):

  • Value >5M, Age >24: this was a “toin coss” category. The tree tells that if he is not a defensive player, it’s a bad idea. In other words: defensive players blossom later.
  • Value >5M, Age <24: this was the best category. You know what makes it even better? If he is a forward: 10 out of 11 forwards in this category were good picks. I guess this means that forwards are generally better picks, since they score more goals.
  • Value 3-5M, Argentinian: it was a bad idea 22 times out of 23, so there’s no way to make it more meaningful (the tree notices that the lonely good pick was a 23.19 year old player, Alvarez).
  • Value 3-5M, non-Argentinian: this was a coin toss. But – the new level tells you – you should be bolder if the player has already played for his national team!
  • Value 850Th-3M, German: we already knew there were some gems here. Now we know that most of them weren’t older than 21 years and a half.
  • Value 850Th-3M, non-German: we already knew this was a bad idea. But if they are very young (less than 19) it becomes a coin toss. So generally, it looks that players in this market value tier are good picks only if young; quite low value can be good, if age is very low.
  • Value <850Th, Age >19: the worst category. Now the tree tells us that, well, below 21.5 year old could be good (but probably is not). In other words: if he has a very low market value, he has to be as young as possible.
  • Value <850Th, Age <19: generally a bad idea. Now the tree adds that if the player is prices less than 100K, it is a very bad idea.

One little thing that I like is that all features are used: Age, Value, Nation, Caps, Position.

Future work on decision trees: I don’t expect a fifth level to add anything meaningful, but why should not I try anyway? Also, what happens is we set a higher values (e.g. 30) for the minimum number of samples in leaf nodes?

Brazil squad, Copa America 2016

Fantasy Scout players:

  • Alisson (Jesus 2014-16)
  • Filipe (Sid at n. 2007-10)
  • Marquinhos Aoas (Daniele 2012-14)
  • Fabinho (Daniele 2012-14)
  • Rodrigo Caio (Mark 2012-14)
  • Douglas Santos (Saintjust 2014-16)
  • Luiz Gustavo (Binder 2010-12)
  • Willian (Jacopo 2007-10)
  • Douglas Costa (Andrea V. 2007-10)
  • Philippe Coutinho (Andrea B. 2007-10)
  • Casemiro (Daniele 2010-12)
  • Renato Augusto (Michael H. 2007-10)
  • Rafinha Alcantara (Andrea V. 2014-16)
  • Gabriel Barbosa (Nigel 2012-14)

Player that can still be picked (from Wikipedia):

  • Ederson – GK – August 17, 1993 (age 22) – 0 caps – Benfica

Anatomy of a good pick – Decision trees

I did what I had announced in my previous post. More precisely, for the time being I built (well, Sklearn built…) the decision tree.

The only difference between what I actually did and what I had posted is that in the end I only used the one-hot encoded data, because I realized that the decision tree were doing things like “if Position < 1".

Here is the decision tree (full resolution):

Even before explaining what the text and the colours mean, it’s clear that the tree is way too detailed: this is overfit, for sure. So I told Sklearn to limit the tree to 3 levels. Then, since the resulting tree had a leaf with just one player, I told Sklearn that I didn’t want any leaf with less than 5 players. Here is the result (full resolution):

OK, now I can explain the text and the colours (the meaning is quite intuitive, but writing it down doesn’t hurt).

Text in each node, first line to last:

  1. Feature (omitted in leaf nodes, i.e. in the bottom nodes) = the question you should ask. If the answer is “yes”, go left; if the answer is “false”, go right. Things like “Germany < 0.5" means "not German" (since any German has Germany=1, and any non-German has Germany=0).
  2. “gini” = measure of the impurity of the node. I.e. the higher this value, the more balanced bad vs. good picks.
  3. “samples” = number of players that fall in this “bin”, i.e. that have the features that have been selected up to this node.
  4. “value” = how many bad (first value) and good (second value) picks are in this bin.
  5. “class” = whether most picks in this bin are bad or good.

Colour is ((if I’m not wrong) the function of two values:

  • Class: bad = red, good = blue.
  • Gini: less balance = darker, more balance = lighter.

So, what does this tree tells you?

  • In short, pick a player only if he’s worth more than 5 million Euros according to Transfermarkt and he is 23 or less. If 24 or older, think twice. High market value and young… I know, we didn’t need machine learning to know it. Maybe the exact values (5 millions, 24 year old) are somehow interesting, though.
  • Players worth 3 to 5 millions: it’s a coin toss… unless they are Argentinian. Avoid Argentinian players worth less than 5 million at all costs! (indeed this could be a non-trivial insight)
  • Players worth 850,000 to 3 millions: probably a bad pick… unless German. There are German unknown gems, but be cautious anyway.
  • Players worth less than 850,000: bad picks, even more probably if they are 19 or more.

Market value at the time of pick

I checked the transfer value in Euro (according to Transfermarkt) of every player the day he was picked.

Average value: 3,623,311.

Top 10 highest values:

Player Scout Picked Value
Callejon Daniele 2014-11-10 25,000,000
Diego Costa Michael H. 2013-09-26 25,000,000
Gago Andrea V. 2007-01-15 20,000,000
Arteta Pietro 2010-05-20 18,000,000
Amauri Daniele 2008-05-29 16,000,000
Leno Andrea V. 2015-04-05 16,000,000
Frey Mattia 2009-10-13 15,000,000
Garcia Abubakr 2014-09-04 15,000,000
Roberto Firmino Andrea V. 2014-03-27 15,000,000
Wendell Andrea V. 2015-11-04 15,000,000

Considering I also have players #11, #12 and #13… I think I have found out why I suck.

Players who had no value when picked: Alena, Allione, Ariaudo, Banega, Bartley, Batalla, Bell S., Beretta, Boga, Bolatti, Caldirola, Carrera, Cerri, Ciano, Ciro, Comi, Criscito, Crisetig, Danilinho, Danilo Barbosa, Danti, Destro, Driussi, Dumitru, Edouard, Fossati, Gabbiadini, Gallinetta, Geferson, Gnabry, Gomez, Guglielmi, Hewson, Immobile, Jean, Kakuta, Lamela, Laribi, Leandro Joaquim Ribeiro, Leandro Lima, Lucas Piazon, Macheda, Mammana, Mannini, Marrone, Mastour, Maurides, Mayoral, Miquel, Morosini, Muniesa, Murphy, Nahuel, Neymar, Ozyakup, Pereira G., Pezzella, Rashford, Robinson, Rodrigo Dourado, Rodriguez Jese, Rodriguez L., Salvio, Sinclair J., Sneijder, Sterling, Suso, Vadala, Valdivia, Wilson, Yildirim.

Top 10 scouts by numbers of non-valued players picked:

Scout Totale
Giovanni B. 11
Jesus 8
Abubakr 5
Saintjust 5
Alberto 4
Nigel 4
Daniele 3
Mattia 3
Pietro 3
William 3