As you can see, the mask of the player on the left has a lot of white pixels, while the mask of the player on the right has almost no white pixels. In essence, I’m taking the ratio of the mask’s white pixels to total pixels. The darker the original image, the more pixels the mask will have and thus the higher that ratio. Using these ratios we are able to identify the teams. But if a player on the white team is in front of a black backdrop or vice versa, this method would mislabel the player. In order to compensate for this, I instituted player tracking
Again, I need to give another big shoutout here. I built my player tracker off of this repo. Building off of this setup saved me substantial time, and the framework really helped me understand the tracking workflow.
There is an important distinction to be made between object detection and object tracking. Up until this point, I had been using object detection. With object detection, each frame is processed independently of the frames before or after it. It looks at the frame, identifies objects and its job is done. Object tracking on the other hand creates “objects” out of these detections and attempts to “connect” them across the frames. This would allow me to track and follow each individual player.
As you can see above, I’m able to attach an “id” to each player and follow them throughout the clip. With that, I could attach individual features to each player. In particular, I could keep a running average of their pixel ratio. In any given frame the model may mislabel a player’s team, but on aggregate this would help sort them onto their respective teams.
Putting everything together I had myself a model that was beginning to understand basketball. I could feed it raw footage and it would return the number of made baskets, which team made the baskets and output a highlight reel of said baskets. Below you’ll see what my model output when I fed in three minutes of a basketball game it had never seen.
This model is a work in progress. Maintaining the same tracker on one player as they run in-front, behind, and around other players is not simple. In future work, figuring out a way to utilize the player’s number could improve my player tracking. Additionally, my model falsely identified a made basket due to heavy commotion behind the net. Training my model on more images from more games should help it to better distinguish between true made baskets and this noise.
With all that said, this model is a great proof of concept. The video illustrates that a computer model can start to understand the same game you and I watch. More than that, the fact this was created on a “simple” laptop (that harnessed the power of GPUs on a remote machine) is a testament to how far and how powerful machine learning has come.