As a tennis player, I think this is super impressive.
Their models actually do a decent job of replicating true tennis strategy, and as they pointed out, even account for the quirks like the left handedness of Nadal.
However, it's still a bit unrealistic due to the lack of full data.
There's 3 things that make a tennis shot what it is: placement (covered in the video), pace (speed of ball), and spin (rpm and direction of spin). In this method, they only use placement. Probably because pace and spin data don't exist at this scale.
But there's a big difference between a slice, flat, and top spin shot to the same placement on the court, and it directly affects the return shot. For example, it's a very common and 'safe' play to return a slice with a slice
Hmm from the look of it, it's not clear Hawk eye does everything.
Certainly it does pace. But from the article it's hard to say if it would track spin - it's not clear to me that with the natural markings on the ball (seam lines + ballmaker logo) and the lighting that exists in the stadium, that would be enough fidelity for some algorithm to calculate the spin.
I think Hawkeye could be updated to easily do it though, and maybe if they would be willing to draw 1 or 2 black dot markers on the ball in addition to the natural markings.
I'd think that pace and spin could be extrapolated relatively easily if you know placement and time; particularly if you know (or can extrapolate) the height of the ball.
edit: Well, perhaps not entirely. Trajectory is a function of a number of factors including spin. However, the ball itself is a factor. More fuzz = more drag = spin has a greater effect on trajectory. That would vary from ball to ball and even over the lifetime of an individual ball.
--
For those not familiar with tennis:
Primarily because of aerodynamic drag created by the fuzz on tennis balls, the ball's spin greatly affects its trajectory.
A topspin makes the ball dive down more sharply. This is how players can hit the ball extremely hard and fast, yet still land in bounds, as opposed to flying out of bounds.
This "trick" of topspin is also why tennis is easier than it may first seem if you've ever tried it. It's not easy to learn topspin but once you do, it increases your margin for error.
I'm also a tennis player and fan - agreed that it's super impressive, but a bit unrealistic.
Certain smart racquets (I think Babolat's?) can track the rpm and spin direction based on the head movement. I think using this type of data too could make the difference in terms of realism.
Interesting work that demonstrates the benefits of the use of domain knowledge [1] and of trying to understand a dataset, rather than throwing a bunch of data at an and-to-end black box and hoping for the best. In particular, data in the tennis point play domain is too sparse for approaches that rely heavily on large amounts of dense data, like a neural network. This is good, old-fashioned AI work and I mean that 100% as a compliment.
_______________
[1] Quoting from Section 10:
Finally, our work makes extensive use of domain knowledge oftennis to generate realistic results. This includes the shot cycle statemachine to structure point synthesis, the choice of shot selectionand player court positioning outputs of player behavior models,and the choice of input features provided to these behavior models.
A successful behavioural model of that kind is a contribution in and of itself, useful beyond the task of simulation presented in the paper.
Your comment does not warrant downvotes. The results do resemble the video capture (rotoscoping?) technique used in the 1992 videogame Mortal Kombat, and it's worth pointing that out:
https://youtu.be/Tj3_0AmiJbg?t=5
But of course Mortal Kombat's sprites were not being generated in real-time through a flexible machine learning model.
The indicator of reliability has always been the chain of custody. Video must be authenticated by an appropriate witness before it can be introduced as evidence.
I don't think it's too unlikely that the cost of producing the CGI for a film like "The Avengers" could go from ~1e7 USD to ~1e3 USD over the next ~10 years. Seems like that could have some creative uses.
From the abstract: "Our system can generate novel points between professional tennis players that resemble Wimbledon broadcasts, enabling new experiences such as the creation of matchups between players that have not competed in real life, or interactive control of players in the Wimbledon final."
I don't believe academic researchers needs to justify their work by providing real life applications. But if that were the extent of deepfake's utility, I'd be underwhelmed.
Yes, simulating real people for entertainment value seems to be the primary use case at this point. I'm just unsure that the value tradeoff between "simulating real people for entertainment" and "simulating real people for fraud, propaganda, misinformation, etc." is in our favor. It's inevitable, we'll just need to adapt.
I recently lost my dog. The idea of interacting with a virtual model of her to help ease the grief is interesting and scary, and ripped straight from a Black Mirror episode.
A mother meeting her dead daughter again through VR. It's a video that fills me with all sorts of emotions. It feels so wrong but at the same time it is very touching and beautiful.
This is, I feel, one of the truly beneficial uses of blockchain technology: storing media hashes as they are recorded so that they can be authenticated later.
Cameras that digitally sign pictures have been around for a while. They have also been cracked already (back in 2010: https://www.elcomsoft.com/news/428.html )
I haven't seen any cameras that digitally sign videos as well. Sounds doable.
However the secret signing key will have to reside inside the camera in a way that a determined attacker cannot extract it. Sounds hard.
If we're being serious about cryptography, you could have cameras signing videos in real time with a secret key, in such a way that each camera has an individual key stored in a specialized chip that does the signing, with no way to access the key without breaking the chip's seal.
So you could still fake videos, but any accusation of tampering could be verified by checking the seal is intact. (of course you need non-forgeable seals too, but that's comparatively easier)
If (honestly I've no idea) we can produce cameras that have a higher FPS than the display's refresh rate (or resolution), then we'd be able to detect that.
A simpler detection mechanism would be to use stereoscopic (or rather real depth detection) cameras though.
I wonder what cryptography can contribute with here. Could sensor manufacturers integrate a private key in the video stream which would authenticate an unbroken chain of frames?
No need to think that far. Just calculate the hash of the video file stored on camera (or segments of it during e.g. live stream) and authenticate this using a private key stored in a secure element on the camera. Do you trust this secure element enough (see the breach of Intel’s CPU Private Keys via SpecEx, maybe you can do so by loading a custom firmware as well?).
Assume the secure element is in fact secure, the issue then, as with any public/private key scheme, lies with establishing trust of the keypair. Do you trust the manufacturer that he will not be breached?
And more pressing even; How do you prevent someone modifying their internal camera video stream such that they may send any data to the authentication chip/mechanism?
And if all this is implemented, it can be done even more low level - just direct the camera e.g. to your screen (I assume solutions would then crop up to increase the fidelity of such a solution).
I am not saying such a solution would not provide ANY benefits, I am just pointing out that these issues prevent it from becoming a silver bullet.
I can’t get the paper to load, but I’m curious if they address the lack of player shadows. Having shadows would have made the video much more realistic.
I think the player modeling is way more interesting than the visual representation and could probably be competitively useful.
I wonder if you could use some of the recent advances in pose estimation to rig a 3D model of each player rather than the rotoscope look of clipped frames in the demo.
100% this. The modelling data and predictive analysis is where the true value in this lies. If they make that available to the players trainers it's going to assist greatly in helping players improve against tricky opponents.
I'm excited to see this applied to more popular video games such as FIFA, Madden, and NBA 2k. The behavior modeling also likely has huge applications in NBA film analysis and figuring out how a traded player might "gel" in a new team.
What about it's implications for things like ballet in a world where people can't go to live venues and there is no money to support real dancers.. Or music videos that already have auto-tune singers.. why not generated backup dancers..
I wonder if players will be able to use this to prepare for opponents. For example, knowing where to hit and "seeing how they typically react" and then predicting where they'll most likely to return the ball.
If so, this could be expanded to other sports, maybe even team sports, where you can test set plays against the simulated defense.
If so, this could be expanded to other sports, maybe even team sports, where you can test set plays against the simulated defense.
I could see this potentially having some value.
The challenge is that in sports, the opponent you face on a given day isn't "the statistical average of their past performances" - they are facing you with a game plan tailored specifically for them versus you, and their game plan will evolve over the course of the contest depending on what's working and what isn't working.
For example, "Nadal likes to hit the ball to Federer's backhand" is, statistically, true. It's basic tennis strategy. But the on-court reality is more nuanced. Nadal is going to vary that approach on the fly based on his opponent and how well that strategy is working on a given day.
Modeling this for a simulation would have to be similarly nuanced, with the simulation not just replicating Nadal's overall statistical tendencies, but how those tendencies evolve over the course of a match based on various conditions and his success or lack of it.
Of course, some aspects of Nadal's game are more easily modeled than others. If an opponent was training to face Nadal on a clay surface, I could simulate that with a single line of code: "Game Over." =)
This is what I imagined video games would look like in the future... looking from the 80s. Hi-res background, but static. Awesome life like sprites, but basically the same video games we had on the C64.
It was somehow very soothing to watch that video. It felt like someone was telling me a bed time story about a different brighter future that never came to be. Very "Back to the Future".
This is amazing. I played this arcade game like 30 years ago that projected 3d video that you could control. But instead of generative, it was a bunch of little clips spliced together so you tap shoot and it plays the shoot clip. Had lots of awkward seams but it was still really impressive. Now we get the real thing.
My head gravitates towards how visualization makes the behavior modeling more palatable. The incredible technical feat on display is the video rendering, not so much the behavioral modelling, but it seems conceptually straightforward to add increasingly better "sabermetric" analysis to control the player's choices.
The data might be just as “productive” as a spreadsheet or formula to inform play, but it requires someone with a more specialized skill set to translate its meaning. The HCI design, for lack of a better word, in rendering the data visually makes it not only more entertaining but easier to "see"—for mainstream users, pro players, or almost anyone. Design makes things visible.
I was thinking at exactly this idea for over 2 years.
I even started with Python & OpenCV for basic background extraction, and as I expected, the edges for the players are imperfect. But still, the result is very very promising. I'm so glad someone did it.
Why I was thinking at it is because the end goal would be to apply frame matching & transitioning to football. Tennis is the easier task, the camera angle is almost fixed.
But from this result, to football, we aren't far. Even an approach based on AI + some human intervention would suffice.
Part 2:
Even though Fifa / PES are doing great things, no video game can match the player personalities and add specific / individual animations for everyone.
By using the already recorded matches, we would have something very authentic in terms of how players behave on the field.
I’m sure Novak Djokovic would like to simulate an alternate reality where that went slightly to the side, he didn’t get disqualified as a result, and then he continued his undefeated 2020 run.
watching those realistic video sprites playing reminds me a lot of the old tennis game on NES, it really makes you wonder how far and bizarre this technology can become?
Their models actually do a decent job of replicating true tennis strategy, and as they pointed out, even account for the quirks like the left handedness of Nadal.
However, it's still a bit unrealistic due to the lack of full data.
There's 3 things that make a tennis shot what it is: placement (covered in the video), pace (speed of ball), and spin (rpm and direction of spin). In this method, they only use placement. Probably because pace and spin data don't exist at this scale.
But there's a big difference between a slice, flat, and top spin shot to the same placement on the court, and it directly affects the return shot. For example, it's a very common and 'safe' play to return a slice with a slice
Would like to see the full extension one day