Monday, March 16, 2020

What's Cool on Campus - Data Analytics and Moneyball MLB and NCAA research



Part One: Open Letter to UNF Athletics: What's Cool on Campus - Data Analytics (UPDATE)

JACKSONVILLE, FL - After my initial research on the topic in September 2019, I continued researching data analytics adoption at the NCAA team level, adding six teams to the initial survey presented to the administration. 

The approximate 4 - 6 increase in wins over the trailing eight-year average of wins -- or wins expected -- remained in place. 

Growing number of programs finding or renewing success: (2019 record versus 2011-18 record)


Next, I looked at the final 2019 College Baseball poll from USA Today. I added a Y/N to sort for teams that had publicly announced their use of data analytics. An earlier USA Today article had mentioned Louisville and Texas Tech as College World Series participants that did not have an existing analytics program in place.  

from USA Today:
https://www.usatoday.com/sports/ncaa-baseball/polls/coaches-poll/2019/

2019 Final USA Today College baseball Poll - by adoption of data analytics
  • Seventeen of the Top 25 teams had adopted data analytics (68% rate).
  • Adoptees median number of Wins were 4 greater than non-adoptees, W% was 0.031 greater, which implies approximately +2 Wins. 
  • Adoptees tended to play later into the post-season than non-adoptees.
  • In other words, adding 2 - 4 wins while playing amongst the most competitive sub-set of teams you could select is meaningful. 
Next, I looked at adoption rate by conference. Here is where I feel the story began to expand and crystallize. 

I examined the Power-5 conferences (P-5) and selected Non P-5 conferences like the ASUN. This revealed the following:



  • The Power Five conferences (P-5) adopted at a 55% rate versus 14% for Non P-5 conferences. 
  • The SEC had adopted at 86% rate and the ACC at 64% rate, the highest rates observed. 
  • The SEC is on track to hit 100% by 2021. 
  • The Power 5 Conferences are on track to hit 100% by 2022.


Non P-5 schools, by contrast, are on an estimated adoption rate that mimics MLB's rate of increase (as shown below). This puts them on path to hit 100% by approximately 2032.  

That 10-year gap, if the Non P-5 status didn't do it, will relegate these schools to long-term, if not permanent, 'have-not' status in collegiate baseball. 


Year 1 = 1997 for MLB and 2017 for college baseball. 

Elitzur's MLB Timeline of Adoption (SABR teams over time)

These conclusions, if they hold true, are somewhat ironic in that one pillar of the Moneyball Theory and Dr. Elitzur's study, is that in MLB, poorer teams used data analytics to gain advantage against richer teams, in what was inherently an unfair contest between unequals. 

The analogy to college baseball flips that framework on its head in that richer teams are adopting it to further cement their advantage over less well-endowed competitors. 

 
Cum Wins v. Cum Payroll (1998-2016) - Avg. Payroll v. Diff. Wins vs. Expected Wins

Here is where things get interesting. I examined cumulative wins (from baseball-reference.com) versus cumulative payroll (from Lahman database) and applied conditional formatting to identify the top ten teams (green) and the bottom ten teams (red) by each category. The (white) cells are the middle teams per category. 

In the first category, cumulative wins and cumulative payroll, the Red Sox and Yankees (coded green-green) bludgeoned the field as top ten in payroll and top ten in wins. Not a huge surprise. 

Pittsburgh and the Rays (coded red-red) were bottom ten in both categories and that is not a huge surprise either. A 0.77 correlation between wins and payroll should come as no surprise either, it's the basis for the so-called "competitive balance tax" or de-facto salary cap. 

Cum. Wins v. Cum. Payroll (1998-2016) grouped by 10's

If you expand the conditional formatting here to show top ten-bottom ten-middle ten by category, you get some unusual pairings. 

Green-Green = BOS,LAA,LAD,NYY,SFG,STL
Green-White = ATL,CLE,TEX
Green-Red = OAK
White-Green = CHC,NYM,PHI
White-White = CHW,CIN,HOU,SEA,TOR
White-Red = ARI,MIN
Red-Green = DET
Red-White = BAL, COL, 
Red-Red = KCR,MIA,MIL,PIT,SDP,TBR 

This analysis looks at the absolute level of spending and the absolute level of wins. The top and the bottom level teams are segregated on that basis with only Oakland delivering on the upside as far as wins relative to payroll in aggregate which makes sense. They are the crown princes of Moneyball. 

Detroit finishing lower third delivering wins while spending in the upper third on the payroll side makes them the anti-Moneyball franchise so far.  

But is that the fairest way to grade franchise when the premise is to spend money efficiently rather than wastefully or recklessly?



Sum of Wins Over or Under Expected Wins and Avg Payroll Level


From the baseball-reference.com historical wins by team/year data, an "expected wins" field was created by weighting the prior three years win total, as is commonly done with projections. 

2019EW for example would be ((2018 W * 3) + (2017 W * 2) + (2016 W * 1)) / 6. 

OUW would then be the Over / Under of Actual Wins versus Expected Wins. 

I applied the same conditional formatting rules to sort in top-middle-low ten team buckets and calculated the following matrix of Wins Expected versus Average Payroll. 


Elitzur Wins -  Payroll Matrix

In my opinion, this gives a reasonable snapshot of which teams are more successfully employing the Moneyball concept of "doing more with less." 

High Wins over Expected with Low Payroll:
Oakland, Tampa Bay and Minnesota have had good success doing more with less. Arizona and Washington are also consistently on top pf their divisions. 

Low Wins over Expected with High Payroll:
Detroit and the Los Angeles Angels have had a bad run while spending at some of the highest levels. A bad combination recently.  

High Wins and High Payroll:
Boston, Chicago Cubs and Philadelphia fall into this category. High spending but high wins over expectations to match. 

High Wins with Middle range Payroll:  
Cleveland Indians with some consistently good years and the Houston Astros with some feast or famine years and a relatively new adoptee of data analytics (2012) scored well on Over/Under Wins, especially recently. 

This introduces another interesting observation which may appear to run counter to one of the earlier observations, that "if you were not in early, you were left behind."



               
click link above to see in Tableau:

Some teams who were late adopters of an analytics driven approach have had recent success in terms of wins above expectations (OUW):

Houston and Chicago Cubs (2012) and +8.63 / +3.06 OUW
Minnesota (2015) and +8.06 OUW
Philadelphia (2016) and Arizona (2017) +5.08 / +7.44 OUW

Houston and the Cubs have had good success but will regress somewhat as they add more years under adoption. Minnesota, Philadelphia and Arizona are more recent small sample size successes with only three to four years under adoption. 

Those team's recent successes, added to the previous success of the A's and the Rays, lend themselves to a belief that the teams that "need" to succeed with a Moneyball approach the task with a "have to have this work" mentality rather than a "nice to have" it work, if not throw some money at the problem. Failure is an ever-looming, existential threat to their success. This leads to a deeper commitment to the task and a greater buy in from everyone in those organizations. 

For the high payroll, high to middle success teams, data analytics is "nice to have" but not really "have to have". There is a safety net of the owners checkbook, an "in case of fire, break glass" option lesser well-endowed teams do not have, that blurs the amount of credit that should be given to data analytics for success in the W column. The low payroll teams are overcoming the 0.76 correlation between payroll and wins, the high payroll teams are surfing it to success. 

Each successive CBA defines the rules of engagement teams operate under and they naturally lend themselves to this type of stratification and perhaps always will. Organizations and staff at all levels are constrained by these rules and work with them and in some cases around them to the best of their ability in order to succeed.

On the collegiate side of the ball, college teams that lag in adoption could see some glimmer of hope from the late-adopters in MLB, however with the greater disparity and distribution of talent and the differences in the rules of engagement the NCAA and their member conferences set up between the teams, that glimmer of hope quickly morphs into a chasm of despair.  

Going forward, I would like to take more of a flow versus stock comparison, ie: change in Win Expectation versus Y-O-Y changes in payroll historically and see what that reveals.


References:
Elitzur, Ramy. “Data analytics effects in major league baseball.” (2020).

No comments:

Giants Top Minor League Prospects

  • 1. Joey Bart 6-2, 215 C Power arm and a power bat, playing a premium defensive position. Good catch and throw skills.
  • 2. Heliot Ramos 6-2, 185 OF Potential high-ceiling player the Giants have been looking for. Great bat speed, early returns were impressive.
  • 3. Chris Shaw 6-3. 230 1B Lefty power bat, limited defensively to 1B, Matt Adams comp?
  • 4. Tyler Beede 6-4, 215 RHP from Vanderbilt projects as top of the rotation starter when he works out his command/control issues. When he misses, he misses by a bunch.
  • 5. Stephen Duggar 6-1, 170 CF Another toolsy, under-achieving OF in the Gary Brown mold, hoping for better results.
  • 6. Sandro Fabian 6-0, 180 OF Dominican signee from 2014, shows some pop in his bat. Below average arm and lack of speed should push him towards LF.
  • 7. Aramis Garcia 6-2, 220 C from Florida INTL projects as a good bat behind the dish with enough defensive skill to play there long-term
  • 8. Heath Quinn 6-2, 190 OF Strong hitter, makes contact with improving approach at the plate. Returns from hamate bone injury.
  • 9. Garrett Williams 6-1, 205 LHP Former Oklahoma standout, Giants prototype, low-ceiling, high-floor prospect.
  • 10. Shaun Anderson 6-4, 225 RHP Large frame, 3.36 K/BB rate. Can start or relieve
  • 11. Jacob Gonzalez 6-3, 190 3B Good pedigree, impressive bat for HS prospect.
  • 12. Seth Corry 6-2 195 LHP Highly regard HS pick. Was mentioned as possible chip in high profile trades.
  • 13. C.J. Hinojosa 5-10, 175 SS Scrappy IF prospect in the mold of Kelby Tomlinson, just gets it done.
  • 14. Garett Cave 6-4, 200 RHP He misses a lot of bats and at times, the plate. 13 K/9 an 5 B/9. Wild thing.

2019 MLB Draft - Top HS Draft Prospects

  • 1. Bobby Witt, Jr. 6-1,185 SS Colleyville Heritage HS (TX) Oklahoma commit. Outstanding defensive SS who can hit. 6.4 speed in 60 yd. Touched 97 on mound. Son of former major leaguer. Five tool potential.
  • 2. Riley Greene 6-2, 190 OF Haggerty HS (FL) Florida commit.Best HS hitting prospect. LH bat with good eye, plate discipline and developing power.
  • 3. C.J. Abrams 6-2, 180 SS Blessed Trinity HS (GA) High-ceiling athlete. 70 speed with plus arm. Hitting needs to develop as he matures. Alabama commit.
  • 4. Reece Hinds 6-4, 210 SS Niceville HS (FL) Power bat, committed to LSU. Plus arm, solid enough bat to move to 3B down the road. 98MPH arm.
  • 5. Daniel Espino 6-3, 200 RHP Georgia Premier Academy (GA) LSU commit. Touches 98 on FB with wipe out SL.

2019 MLB Draft - Top College Draft Prospects

  • 1. Adley Rutschman C Oregon State Plus defender with great arm. Excellent receiver plus a switch hitter with some pop in the bat.
  • 2. Shea Langliers C Baylor Excelent throw and catch skills with good pop time. Quick bat, uses all fields approach with some pop.
  • 3. Zack Thompson 6-2 LHP Kentucky Missed time with an elbow issue. FB up to 95 with plenty of secondary stuff.
  • 4. Matt Wallner 6-5 OF Southern Miss Run producing bat plus mid to upper 90's FB closer. Power bat from the left side, athletic for size.
  • 5. Nick Lodolo LHP TCU Tall LHP, 95MPH FB and solid breaking stuff.