In honor of The Baseball Show with Rany and Joe ending I wanted to evaluate one of the comments Rany Jazayerli made. Since I was listening to it as I was cooking I forget the exact quote, but he said something to the effect of (about Andrelton Simmons), “his contact rates are high enough to have a .280 batting average.” He could have said high average instead of .280 or maybe it was some other number; I could have made it up, so I apologize if I misquoted him.
However, the idea got me thinking about contact rates. The most important takeaway of this idea is I wanted to see what type of significance do contact rates have on a hitter’s batting average? Specifically, do how high contact rates correlate to higher batting averages?
To get better understanding of regression and what R squared means, feel free to checkout a primer I wrote about regression. I looked at hitters with at least 200 plate appearances (by year) from 2008-13 via FanGraphs. The sample size was 2,095 hitters. The y-axis is batting average and the x-axis is contact%.
With an R squared of 0.09 suggests there is very little (positive) relationship between contact rates and batting average. So, basically, a player’s ability to make contact does not necessarily mean a player will have a higher batting average. I find this type of information really cool because there are times when I’ll write about a player’s contact rates without ever questioning if contact rates had any impact on batting average.
For fun I looked how well BABIP correlated to batting average (with the same sample size) and this is what I found (image below). With a R squared of 0.64 suggests there is a positive relationship between a high BABIP and a high batting average. That said, it is not the end-all-be-all so do not automatically assume a higher BABIP will automatically result in a higher batting average.