With only one minute left in the fourth quarter, the visiting team's offense is dangerously close to the end zone. The stakes have never been higher, and the home team needs their best defense. But the coach knows exactly what formation he should use to counter the offense because he has spent thousands of hours reviewing videos each season, finding what plays work best in certain situations.
Manual analysis of sports footage is a time-consuming process. Sports analysts and coaches have used tools to split up videos into short clips of the plays and then manually scrub through each clip to identify the offensive team’s formation, watch the clip multiple times to observe the outcome, and record important information. NFL games feature an average of 153 plays, so this process has to be done many times for each game.
BYU student Benjamin Orr and IMMERSE-X student Ephraim Pan worked with Dr. Dah-Jye Lee to use a multi-person event detection method to identify the snap in offensive plays. The approach uses a long short-term memory (LSTM) network to detect the frame that has the lowest-magnitude motion. This automated snap detection reduces the need to scrub through games to find the beginning of the plays when players are in formation.
The algorithm outputs the frame where the players are in formation, allowing coaches to analyze it. Using object-detection models, player and number locations are recognized. The field lines are detected with computer vision techniques, and all of this information is used to transform the player locations into bird’s-eye view on an image of a virtual football field. With the utilization of YOLOv8x, the object detection framework, the team noted a large improvement in precision from previous models.
This project came with many challenges. It is easier for humans to recognize players and their positions on the field than it is for an algorithm. Algorithms rely on the positioning of cameras and the quality of the footage, which is inconsistent between stadiums, games, and plays.
In the beginning phases of testing, the football video game Madden 2020 was utilized to take advantage of consistent lighting and quality of footage to reduce some of the challenges that come with analyzing a real football game. However, this early research had limited utility; the player locations were determined, but because of the low angle the photos were captured at, they were not entirely accurate. In other previous research, the team used a traditional line detection technique to determine the player’s placement on a virtual football field from a bird’s-eye view. The team’s most recent research built upon these methods, taking advantage of line detection techniques and the object-detection models.
Future research will involve improving accuracy and automating the identification of players and their positions. The final goal is automatic generation of a statistical report that provides analyses from the game footage.
More details about their research can be read here.