On Wednesday, Aaron Judge hit a killer line drive at 107.5 mph. Statcast gave the ball in play a 73% chance of falling in for a hit. But, it was caught by Manuel Margot without too much difficulty. At first glance, that’s just baseball. But it sure seems to happen a lot to the Yankees. I think there’s more going on than just bad luck.
Last week, I wrote about a weird anomaly where the Yankees seem to be underperforming their batted ball outcomes. The short version is that the 2021-2023 Yankees had the biggest gap between estimated outcomes given exit velocity and actual on-field results in baseball, and that gap is probably not just bad luck. The size of the gap is huge; in 2023, it’s 0.018 wOBA, or the difference between an awful offense and a pretty good one.
So what gives? The Statcast data is a little bit annoying to work with, but here’s my quick-and-dirty statistical analysis based on what they let me download easily.
Below are the results of a very basic OLS regression analysis of all 406 qualified hitter-seasons during the 2021-2023 seasons (Rizzo 2021, Rizzo 2022, Rizzo 2023 are separate observations). Regression is one of the simplest tools in the statistician toolbox, but it’s also pretty powerful. This model is estimating (wOBA - xwOBA). A negative differential implies that a player’s wOBA was worse than his xwOBA. A positive differential implies that their wOBA was better.
Okay, so negative differential bad positive good. Now you need to twist your brain for a second. The figure below shows the direction of the relationship between differential and the listed variable. So, a positive coefficient suggests that as one variable goes up, the other goes up as well. A negative coefficient suggests that as the other goes up and the other goes down.
Furthermore, the figure shows the confidence interval (the blue lines) of each estimate. This represents the uncertainty of the estimate. Basically, big confidence interval = big uncertainty. An estimate with a confidence interval that crosses the black dashed line probably doesn’t have much impact on xwOBA differential.
Okay, all of that said, let’s look at the results:
Figure 1: OLS Regression Estimates of xwOBA Differential
Right off the bat, four variables probably don’t matter. It doesn’t matter if a batter pulls or goes oppo. It might matter *a tiny bit* if the batter goes up the middle a lot, but it won’t explain the Yankee struggles. Players who swing at more meatballs (a suggestion from Andy Singer on the Bronx Beat Podcast) might have slightly worse outcomes, but if so the impact is really small.
Sprint speed is significant and positively associated with a player overperforming their xwOBA. This makes total sense. Fast players get more infield hits, turn more singles into doubles and turn a few doubles into triples. If we didn’t find a significant and positive relationship between Sprint Speed and xwOBA differential, I’d question the model.
Average launch angle doesn’t have any impact on xwOBA differential either. That makes sense because wOBA is calibrated based on launch angle and exit velocity. If it’s calibrated correctly, there should be no relationship between launch angle and xwOBA differential.
That’s why it’s really freaking weird that exit velocity has a strong and super-duper statistically significant (t value over 6.00 for the nerds). Each additional point in average exit velocity is associated with an xwOBA differential of -0.0025. The Yankees have an average exit velocity of about 1.5 mph higher than league average, so they should lose about 0.004 of wOBA. That’s not nothing, but it turns a .320 xwOBA player into a 0.324. Since 2021, the Yankees have an xwOBA differential of 0.011. In 2023, the gap is 0.018. Something else is going on here to hurt the Yankees.
The problem isn’t high exit velocity. It can’t be; xwOBA is estimated using exit velocity. The problem is something that correlates with exit velocity. An xwOBA differential is caused by balls in play that should be hits ending up as outs. We need to search for that factor.
Here is my theory: I think that the Yankees are too easy to defend. How can a team be too easy to defend? We’ve already established that the answer probably isn’t related to pull/oppo rates. Maybe because the Yankees hit the ball hard all the time, or at least have a reputation for doing so, opposing outfielders cheat a little bit and play farther back.
It’s a bigger job to test this theory, but as an early plausibility test, I present to you a Statcast query. Here are all balls in play between 95 and 105 mph, balls that are hit hard but not inhumanely hard, that are line drives or fly balls. To mostly control for ballpark, I’ve included only away balls in play:
The Yankees are getting absolutely destroyed by balls that should be incredibly valuable to them. .090 wOBA is a massive effect size. Other than the 2021/2023 Red Sox, no other team is repeated on top of the leaderboard, and the 2021/2023 Yankees are *way worse* than even the Red Sox. Only this year’s Royals can approach their level of bad outcomes on hard hit balls in play.
More to come.
So if true, then the reverse would also be true: bloop hits should have higher actual success than expected. If the yankees are easier to defend, then they should have a higher success rate on “broken” hits than other teams because their opponents are favoring yankee hard hits. Another thing to look at is the at bat strike counts. Perhaps there is something different about yankee’s cadence?