Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Controllable Video Sprites That Appear Like Professional Tennis Players (stanford.edu)
340 points by tosh on Sept 8, 2020 | hide | past | favorite | 75 comments


As a tennis player, I think this is super impressive.

Their models actually do a decent job of replicating true tennis strategy, and as they pointed out, even account for the quirks like the left handedness of Nadal.

However, it's still a bit unrealistic due to the lack of full data.

There's 3 things that make a tennis shot what it is: placement (covered in the video), pace (speed of ball), and spin (rpm and direction of spin). In this method, they only use placement. Probably because pace and spin data don't exist at this scale.

But there's a big difference between a slice, flat, and top spin shot to the same placement on the court, and it directly affects the return shot. For example, it's a very common and 'safe' play to return a slice with a slice

Would like to see the full extension one day


The lack of full data is because tournaments keep Hawk-Eye (https://en.wikipedia.org/wiki/Hawk-Eye) data to themselves right?


Hmm from the look of it, it's not clear Hawk eye does everything.

Certainly it does pace. But from the article it's hard to say if it would track spin - it's not clear to me that with the natural markings on the ball (seam lines + ballmaker logo) and the lighting that exists in the stadium, that would be enough fidelity for some algorithm to calculate the spin.

I think Hawkeye could be updated to easily do it though, and maybe if they would be willing to draw 1 or 2 black dot markers on the ball in addition to the natural markings.


It does track spin, they often show stats measuring rpm.

Additionally spin is a big part of the trajectory model - the Hawkeye cameras are only 60fps, so the trajectory is interpolated.

However, I doubt it’s the cameras measuring spin. It’s more likely spin is a free variable when they fit the trajectory.


Found this: https://commons.nmu.edu/cgi/viewcontent.cgi?article=1746&con...

Very interesting read, but I'm not knowledgeable enough to really give you an accurate tldr, sorry


There is some moonwalking to be seen. I wonder how easy it is to get an neural network to remove that.


I'd think that pace and spin could be extrapolated relatively easily if you know placement and time; particularly if you know (or can extrapolate) the height of the ball.

edit: Well, perhaps not entirely. Trajectory is a function of a number of factors including spin. However, the ball itself is a factor. More fuzz = more drag = spin has a greater effect on trajectory. That would vary from ball to ball and even over the lifetime of an individual ball.

--

For those not familiar with tennis:

Primarily because of aerodynamic drag created by the fuzz on tennis balls, the ball's spin greatly affects its trajectory.

A topspin makes the ball dive down more sharply. This is how players can hit the ball extremely hard and fast, yet still land in bounds, as opposed to flying out of bounds.

This "trick" of topspin is also why tennis is easier than it may first seem if you've ever tried it. It's not easy to learn topspin but once you do, it increases your margin for error.


I'm also a tennis player and fan - agreed that it's super impressive, but a bit unrealistic.

Certain smart racquets (I think Babolat's?) can track the rpm and spin direction based on the head movement. I think using this type of data too could make the difference in terms of realism.


As a non-tennis player, I'd agree, this is impressive and fun.


Interesting work that demonstrates the benefits of the use of domain knowledge [1] and of trying to understand a dataset, rather than throwing a bunch of data at an and-to-end black box and hoping for the best. In particular, data in the tennis point play domain is too sparse for approaches that rely heavily on large amounts of dense data, like a neural network. This is good, old-fashioned AI work and I mean that 100% as a compliment.

_______________

[1] Quoting from Section 10:

Finally, our work makes extensive use of domain knowledge oftennis to generate realistic results. This includes the shot cycle statemachine to structure point synthesis, the choice of shot selectionand player court positioning outputs of player behavior models,and the choice of input features provided to these behavior models.

A successful behavioural model of that kind is a contribution in and of itself, useful beyond the task of simulation presented in the paper.


I watched it and I felt like I was playing Mortal Kombat 3

technology really repeats itself


Your comment does not warrant downvotes. The results do resemble the video capture (rotoscoping?) technique used in the 1992 videogame Mortal Kombat, and it's worth pointing that out: https://youtu.be/Tj3_0AmiJbg?t=5

But of course Mortal Kombat's sprites were not being generated in real-time through a flexible machine learning model.


But in Mortal Combat the players were casting shadows. The lack of shadows below the tennis players' feet makes it look like a green screen was used.


The tech is different but the end result is largely the same. I immediately thought of Mortal Kombat as well when I saw the video.


tekken is more impressive


Mortal Kombat 3 (and previous ones) was just digitized sprites with no logic in between animations, certainly nothing comparable to that.


I can very clearly see the sudden swap between (some) animations in this example as well.

The MK analogy seems spot on.


It’s becoming increasingly clear that video by itself will soon no longer be reliable evidence.

How long before it becomes a plausible criminal defence to say “the CCTV must have been deepfaked, that’s not me”?

The indicator of reliability will be the chain of custody of the video data.


The indicator of reliability has always been the chain of custody. Video must be authenticated by an appropriate witness before it can be introduced as evidence.


This is one of those technologies where I struggle to think of beneficial use cases, but my mind fills with ways this could abused.

That's not to say it shouldn't be researched, or people more creative than me won't think of beneficial use cases.


Think about stuntmen who risk their lives.


I don't think it's too unlikely that the cost of producing the CGI for a film like "The Avengers" could go from ~1e7 USD to ~1e3 USD over the next ~10 years. Seems like that could have some creative uses.


This whole post is a beneficial use case of deepfake-ish tech.


And that is what, exactly?

From the abstract: "Our system can generate novel points between professional tennis players that resemble Wimbledon broadcasts, enabling new experiences such as the creation of matchups between players that have not competed in real life, or interactive control of players in the Wimbledon final."

I don't believe academic researchers needs to justify their work by providing real life applications. But if that were the extent of deepfake's utility, I'd be underwhelmed.


Interactive deep faking of real people in video games seems insanely impressive at least to me. What more could you expect?


Not sure if the deepfaking here is necessary step, but I think it could be useful towards creating humanoid robot assisted tennis training.


Education and entertainment use cases abound.


Yes, simulating real people for entertainment value seems to be the primary use case at this point. I'm just unsure that the value tradeoff between "simulating real people for entertainment" and "simulating real people for fraud, propaganda, misinformation, etc." is in our favor. It's inevitable, we'll just need to adapt.

I recently lost my dog. The idea of interacting with a virtual model of her to help ease the grief is interesting and scary, and ripped straight from a Black Mirror episode.


Your comment about your dog reminded me of this: https://www.youtube.com/watch?v=uflTK8c4w0c

A mother meeting her dead daughter again through VR. It's a video that fills me with all sorts of emotions. It feels so wrong but at the same time it is very touching and beautiful.


Well, there was a criminal justice system before CCTV. I imagine we’ll find a way.


There was a criminal justice system before fingerprint and DNA analysis too. It was just a worse, less reliable system.


I believe this was the plot of Michael Crichton's _Rising Sun_ (1992).


You should watch The Capture :)

https://www.imdb.com/title/tt8201186/


This is, I feel, one of the truly beneficial uses of blockchain technology: storing media hashes as they are recorded so that they can be authenticated later.


Can real video include some signature that cant be faked?


Cameras that digitally sign pictures have been around for a while. They have also been cracked already (back in 2010: https://www.elcomsoft.com/news/428.html ) I haven't seen any cameras that digitally sign videos as well. Sounds doable.

However the secret signing key will have to reside inside the camera in a way that a determined attacker cannot extract it. Sounds hard.


If we're being serious about cryptography, you could have cameras signing videos in real time with a secret key, in such a way that each camera has an individual key stored in a specialized chip that does the signing, with no way to access the key without breaking the chip's seal.

So you could still fake videos, but any accusation of tampering could be verified by checking the seal is intact. (of course you need non-forgeable seals too, but that's comparatively easier)


Generate random key every hour, sign it with previous private key, then delete previous private key.


Can you stop such a camera being pointed at a screen playing a faked video?


If (honestly I've no idea) we can produce cameras that have a higher FPS than the display's refresh rate (or resolution), then we'd be able to detect that.

A simpler detection mechanism would be to use stereoscopic (or rather real depth detection) cameras though.


I wonder what cryptography can contribute with here. Could sensor manufacturers integrate a private key in the video stream which would authenticate an unbroken chain of frames?


No need to think that far. Just calculate the hash of the video file stored on camera (or segments of it during e.g. live stream) and authenticate this using a private key stored in a secure element on the camera. Do you trust this secure element enough (see the breach of Intel’s CPU Private Keys via SpecEx, maybe you can do so by loading a custom firmware as well?).

Assume the secure element is in fact secure, the issue then, as with any public/private key scheme, lies with establishing trust of the keypair. Do you trust the manufacturer that he will not be breached?

And more pressing even; How do you prevent someone modifying their internal camera video stream such that they may send any data to the authentication chip/mechanism?

And if all this is implemented, it can be done even more low level - just direct the camera e.g. to your screen (I assume solutions would then crop up to increase the fidelity of such a solution).

I am not saying such a solution would not provide ANY benefits, I am just pointing out that these issues prevent it from becoming a silver bullet.


Where did you learn about cryptographic systems?

Less about the algorithms, but more about designing applications of cryptographic primitives?


I can’t get the paper to load, but I’m curious if they address the lack of player shadows. Having shadows would have made the video much more realistic.


I think the player modeling is way more interesting than the visual representation and could probably be competitively useful.

I wonder if you could use some of the recent advances in pose estimation to rig a 3D model of each player rather than the rotoscope look of clipped frames in the demo.


I can recommend "Two minute papers" for that kind of stuff. Here is an episode about AI controlled basketball player movements: https://www.youtube.com/watch?v=pBkFAIUmWu0


Yes! have been a subscriber since nearly the beginning of the channel.


100% this. The modelling data and predictive analysis is where the true value in this lies. If they make that available to the players trainers it's going to assist greatly in helping players improve against tricky opponents.


I'm excited to see this applied to more popular video games such as FIFA, Madden, and NBA 2k. The behavior modeling also likely has huge applications in NBA film analysis and figuring out how a traded player might "gel" in a new team.


What about it's implications for things like ballet in a world where people can't go to live venues and there is no money to support real dancers.. Or music videos that already have auto-tune singers.. why not generated backup dancers..


why not generated backup dancers..

It's been done.[1]

[1] https://youtu.be/t9VYYhX3P1Y


I’m not sure I understand the context of the video you linked, are you saying it’s not real?


LiveLeak said it was CG.[1] But it may be real and the article may be fake.

[1] https://www.liveleak.com/view?t=9SDBX_1526444350


(nsfwish)


I wonder if players will be able to use this to prepare for opponents. For example, knowing where to hit and "seeing how they typically react" and then predicting where they'll most likely to return the ball.

If so, this could be expanded to other sports, maybe even team sports, where you can test set plays against the simulated defense.


    If so, this could be expanded to other sports, maybe even team sports, where you can test set plays against the simulated defense. 
I could see this potentially having some value.

The challenge is that in sports, the opponent you face on a given day isn't "the statistical average of their past performances" - they are facing you with a game plan tailored specifically for them versus you, and their game plan will evolve over the course of the contest depending on what's working and what isn't working.

For example, "Nadal likes to hit the ball to Federer's backhand" is, statistically, true. It's basic tennis strategy. But the on-court reality is more nuanced. Nadal is going to vary that approach on the fly based on his opponent and how well that strategy is working on a given day.

Modeling this for a simulation would have to be similarly nuanced, with the simulation not just replicating Nadal's overall statistical tendencies, but how those tendencies evolve over the course of a match based on various conditions and his success or lack of it.

Of course, some aspects of Nadal's game are more easily modeled than others. If an opponent was training to face Nadal on a clay surface, I could simulate that with a single line of code: "Game Over." =)


Why wouldn't they just use actual replay footage for that and not some generated sprite?

Simulated match ups have been done on pro sports video games for years. But they're not that useful because a model is not reality.


Presumably:

* Hard to find all replays against that shot

* Time consuming to review

This would work as compression over that space. However, I don't watch tennis and know nothing about sport. This may be an insignificant improvement.


the title doesn't do justice: it's about the syntheses of tennis players in generative video.


Synthesis of tennis player sprites.


This is what I imagined video games would look like in the future... looking from the 80s. Hi-res background, but static. Awesome life like sprites, but basically the same video games we had on the C64.

It was somehow very soothing to watch that video. It felt like someone was telling me a bed time story about a different brighter future that never came to be. Very "Back to the Future".



This is amazing. I played this arcade game like 30 years ago that projected 3d video that you could control. But instead of generative, it was a bunch of little clips spliced together so you tap shoot and it plays the shoot clip. Had lots of awkward seams but it was still really impressive. Now we get the real thing.


Time Traveller?[1] It used a pepper’s ghost approach.

[1] https://en.wikipedia.org/wiki/Time_Traveler_(video_game)


Could also be Holosseum?

https://en.m.wikipedia.org/wiki/Holosseum

I remember my local AMC 6-plex had one.


Yup. That one.


My head gravitates towards how visualization makes the behavior modeling more palatable. The incredible technical feat on display is the video rendering, not so much the behavioral modelling, but it seems conceptually straightforward to add increasingly better "sabermetric" analysis to control the player's choices.

The data might be just as “productive” as a spreadsheet or formula to inform play, but it requires someone with a more specialized skill set to translate its meaning. The HCI design, for lack of a better word, in rendering the data visually makes it not only more entertaining but easier to "see"—for mainstream users, pro players, or almost anyone. Design makes things visible.


I was thinking at exactly this idea for over 2 years.

I even started with Python & OpenCV for basic background extraction, and as I expected, the edges for the players are imperfect. But still, the result is very very promising. I'm so glad someone did it.

Why I was thinking at it is because the end goal would be to apply frame matching & transitioning to football. Tennis is the easier task, the camera angle is almost fixed.

But from this result, to football, we aren't far. Even an approach based on AI + some human intervention would suffice.


Part 2: Even though Fifa / PES are doing great things, no video game can match the player personalities and add specific / individual animations for everyone. By using the already recorded matches, we would have something very authentic in terms of how players behave on the field.


Yes but can it shoot a ball at a line judge’s neck?


I’m sure Novak Djokovic would like to simulate an alternate reality where that went slightly to the side, he didn’t get disqualified as a result, and then he continued his undefeated 2020 run.


So in your universe it hit the judge and he got disqualified? Oh no, I jumped into another reality again.


watching those realistic video sprites playing reminds me a lot of the old tennis game on NES, it really makes you wonder how far and bizarre this technology can become?


Nick Bostrom might say to the point where we can no longer tell what's real.


I want this real-time generated on endless-tennis.tv




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: