Friday, February 27, 2015

The Statistical Problem of Sample Sizes both Large and Small




So let me see if I have this straight? A small sample size can be a problem according to the Fan Graphs article below, But a large sample size can also present problems according to Relevant Insights? What's a stat geek to do?

From Fan Graphs: "Also, a quote worth remembering: 'In small sample sizes, a good scout is ALWAYS better than stats.'"

Maybe in both small and large sample sizes, it's good to have the eyes, ears and guts of a seasoned scout to guide an organizations fortunes.

from Relevant Insights: 
http://www.relevantinsights.com/representative-sample
 I often get asked "What sample size do I need to get a representative sample?" The problem is that this question is not formulated correctly. 
Sample size and representativeness are two related, but different issues. The sheer size of a sample is not a guarantee of its ability to accurately represent a target population. Large unrepresentative samples can perform as badly as small unrepresentative samples.
A survey sample's ability to represent a population has to do with the sampling frame; that is the list from which the sample is selected. When some parts of the target population are not included in the sampled population, we are faced with selection bias, which prevent us from claiming that the sample is representative of the target population. Selection bias can occur in different ways:


from Fan Graphs:
Sample Size | FanGraphs Sabermetrics Library:

Sample Size

So we have all of these statistics, but when can we use them?  Suppose a player goes three for three in their first game in the big leagues.  Should we expect this player to continue batting 1.000 for the rest of the season?  Of course not, that’d be silly.  Three at-bats is way too small a sample to draw conclusions about a player, but then we’re left with the question: at what point do statistics become reliable?
There has been a lot of research done in this area by Russell Carleton (AKA: the artist formerly known as Pizza Cutter). For his most recent work, you can find his full research at Baseball Prospectus. We’ve included links and a summary below:

Stabilization Points for Offense Statistics:
  • 60 PA: Strikeout rate
  • 120 PA: Walk rate
  • 240 PA: HBP rate
  • 290 PA: Single rate
  • 1610 PA: XBH rate
  • 170 PA: HR rate
  • 910 AB: AVG
  • 460 PA: OBP
  • 320 AB: SLG
  • 160 AB: ISO
  • 80 BIP: GB rate
  • 80 BIP: FB rate
  • 600 BIP: LD rate
  • 50 FBs: HR per FB
  • 820 BIP: BABIP
Stabilization Points for Pitching Statistics:
  • 70 BF: Strikeout rate
  • 170 BF: Walk rate
  • 640 BF: HBP rate
  • 670 BF: Single rate
  • 1450 BF: XBH rate
  • 1320 BF: HR rate
  • 630 BF: AVG
  • 540 BF: OBP
  • 550 AB: SLG
  • 630 AB: ISO
  • 70 BIP: GB rate
  • 70 BIP: FB rate
  • 650 BIP: LD rate
  • 400 FB: HR per FB
  • 2000 BIP: BABIP
In case it’s not obvious, you can tell a lot more about a hitter from one year of data than you can about a pitcher. If a statistic is not included, the means it did not stabilize over the intervals that Russell Carleton tested.

Also, a quote worth remembering: “In small sample sizes, a good scout is ALWAYS better than stats.”

'via Blog this'

No comments:

Giants Top Minor League Prospects

  • 1. Joey Bart 6-2, 215 C Power arm and a power bat, playing a premium defensive position. Good catch and throw skills.
  • 2. Heliot Ramos 6-2, 185 OF Potential high-ceiling player the Giants have been looking for. Great bat speed, early returns were impressive.
  • 3. Chris Shaw 6-3. 230 1B Lefty power bat, limited defensively to 1B, Matt Adams comp?
  • 4. Tyler Beede 6-4, 215 RHP from Vanderbilt projects as top of the rotation starter when he works out his command/control issues. When he misses, he misses by a bunch.
  • 5. Stephen Duggar 6-1, 170 CF Another toolsy, under-achieving OF in the Gary Brown mold, hoping for better results.
  • 6. Sandro Fabian 6-0, 180 OF Dominican signee from 2014, shows some pop in his bat. Below average arm and lack of speed should push him towards LF.
  • 7. Aramis Garcia 6-2, 220 C from Florida INTL projects as a good bat behind the dish with enough defensive skill to play there long-term
  • 8. Heath Quinn 6-2, 190 OF Strong hitter, makes contact with improving approach at the plate. Returns from hamate bone injury.
  • 9. Garrett Williams 6-1, 205 LHP Former Oklahoma standout, Giants prototype, low-ceiling, high-floor prospect.
  • 10. Shaun Anderson 6-4, 225 RHP Large frame, 3.36 K/BB rate. Can start or relieve
  • 11. Jacob Gonzalez 6-3, 190 3B Good pedigree, impressive bat for HS prospect.
  • 12. Seth Corry 6-2 195 LHP Highly regard HS pick. Was mentioned as possible chip in high profile trades.
  • 13. C.J. Hinojosa 5-10, 175 SS Scrappy IF prospect in the mold of Kelby Tomlinson, just gets it done.
  • 14. Garett Cave 6-4, 200 RHP He misses a lot of bats and at times, the plate. 13 K/9 an 5 B/9. Wild thing.

2019 MLB Draft - Top HS Draft Prospects

  • 1. Bobby Witt, Jr. 6-1,185 SS Colleyville Heritage HS (TX) Oklahoma commit. Outstanding defensive SS who can hit. 6.4 speed in 60 yd. Touched 97 on mound. Son of former major leaguer. Five tool potential.
  • 2. Riley Greene 6-2, 190 OF Haggerty HS (FL) Florida commit.Best HS hitting prospect. LH bat with good eye, plate discipline and developing power.
  • 3. C.J. Abrams 6-2, 180 SS Blessed Trinity HS (GA) High-ceiling athlete. 70 speed with plus arm. Hitting needs to develop as he matures. Alabama commit.
  • 4. Reece Hinds 6-4, 210 SS Niceville HS (FL) Power bat, committed to LSU. Plus arm, solid enough bat to move to 3B down the road. 98MPH arm.
  • 5. Daniel Espino 6-3, 200 RHP Georgia Premier Academy (GA) LSU commit. Touches 98 on FB with wipe out SL.

2019 MLB Draft - Top College Draft Prospects

  • 1. Adley Rutschman C Oregon State Plus defender with great arm. Excellent receiver plus a switch hitter with some pop in the bat.
  • 2. Shea Langliers C Baylor Excelent throw and catch skills with good pop time. Quick bat, uses all fields approach with some pop.
  • 3. Zack Thompson 6-2 LHP Kentucky Missed time with an elbow issue. FB up to 95 with plenty of secondary stuff.
  • 4. Matt Wallner 6-5 OF Southern Miss Run producing bat plus mid to upper 90's FB closer. Power bat from the left side, athletic for size.
  • 5. Nick Lodolo LHP TCU Tall LHP, 95MPH FB and solid breaking stuff.