Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
OpenAI bots competing against Humans right now (twitch.tv)
220 points by kahlonel on Aug 5, 2018 | hide | past | favorite | 132 comments


In the last game of the series the bots were forced to play an intentionally terrible hero lineup. The humans finally got to win a game but more interesting for me was that the times when the AI did seemingly crazy things were much more common.

I wonder if this is an artifact of the training methodology: maybe if your team is very weak then your choices are also weaker, and reinforcement learning doesn't work as well?


It reminds me of the Go AIs going on tilt when they're far behind.

When the win percentage for Go AIs gets to around 5%, every action it can take results in a losing game so it can't make the difference between normal play and super strange moves anymore.

When every choice is really bad, humans tend to still go with their normal strategy and wait for their chance to turn things around, but bots assume the opponent is playing perfectly, so they act like their winrate is going to stay near zero no matter what they do.


If this is true (which it might be, I haven’t studied these systems in detail), then an obvious fix is to get the AI to randomly train with ‘weaker’ versions of itself (perhaps while the strong instance is handicapped) in addition to the latest generation.

Several levels of weak opponents should be used, with varying probabilities, to tune the AI’s robustness against real-world, imperfect competitors.


I'd say that those were crazy things, but not too much. AI understood that teamfight were impossibile to win, and so the really weird choices (sven dying for the tower lol) . I'd say that matches with such lineup are not too much and so they didn't really know, but they did an awesome job and kept things balanced for so long


> really weird choices (sven dying for the tower lol)

Taking a tier 2 tower nets everyone on the team 120 gold (a further 150+ gold goes to the hero who gets the last hit), and losing Sven probably gave the opposing team less than was gained.

Perhaps the AI simply placed more value on increasing the total net worth of the team than it valued saving the life of one of its core heroes. Additionally, there was no guarantee that he would have been able to escape, as Sven was deep on the enemy's side of the map, and there's a very real possibility that he could have been ganked from someone in the jungle had he attempted to retreat.


Dota player here: other metrics that could have been involved in the trading decision are:

- Potential chance the enemy team would deny the tower before another friendly hero could take it (netting Sven's team 0 gold for the time spent whacking away at it)

- Map vision (removing a T2 often cuts a significant section of map awareness away, since the tower is no longer providing vision or protection)

- XP gains (Sven won't gain any XP while dead, nor from killing the tower)

- Creep equilibrium (this is less important, or at least thought about less often, later on in the game and past T1 towers, but might've been a factor in drawing the creep clash point to a particular location)

- Dictating team net worth averages (to some extent, if they predicted a loss in opponents forcing a teamfight or predicted a likely pickoff, gold lost could be minimized now by taking a death early, lowering the average net worth on the team).

Obviously, there are others and these can also be mixed and matched in various ways (e.g. cutting off map vision so they can more safely farm additional jungle creeps).

Not saying any of these aspects _were_ a part of the decision to trade Sven for a tower, but.. just wanted to include a few more subsurface aspects that _could_ be used in such a decision.


That's probably part of it, but another explanation would be that the AI wouldn't choose such a lineup on its own, and so it doesn't train on the relevant gameplay.


I think this might be a result of "the only winning move is not to play", so to speak. If the game is, in the mind of the AI agents, unwinnable, not playing is not an option, therefore it begins to pick random actions instead.

I'm not sure if the AI can surrender (I only managed to watch the first two games as it was rather late at night) but it might be a path to explore; having the AI give up if the game cannot be won anymore.


At what percentage would you allow an AI to consider a game unwinnable? While an AI that behaves erratically when the odds are low might be worth allowing it to be considered forfeit worthy, but the thing about humans is we make mistakes. Therefore an ideal AI that can continue to execute reasonable moves should have a lower percentage threshold where it decides to forfeit. See this match[0] for an example of a spectacular comeback that I feel an AI might have considered forfeit worthy if not well defined.

[0]: https://youtu.be/LwSQv_sNZBI


Maybe this is a limitation of self-play. If the opponent an AI faces during training is always optimal, then there's no surface area of mistakes. The losing AI, in its model/mind, knows that the game is over after a specific threshold. So it hasn't learned how to optimize for capitalizing on mistakes.

I wonder if this situation can be fixed by adding more randomness. For example, force AI'1 to be in a losing position to AI'2, but then suddenly switch the power level of AI'2 to be much weaker (where mistakes happen) so that AI'1 learns how to fight its way out of tough situations.


One of the most interesting takeaways from the post game interview for me was that the AI can be very stupid if you just blindly throw it in a self-play setting but with clever use of randomization (modifying power levels) and action restrictions (for example, only allowing the agent to spend an anti-invis item when a nearby enemy goes out of sight) it is possible to provide better learning opportunities for the AI.


> for example, only allowing anti-invis items when an enemy goes out of sight

These are the kind of actions you specifically don't want to code in because you're throwing in human knowledge. You want the AI to learn by itself that using anti-invis when everyone is visible is a low-value move.

The purist in me was even mad that they had a hand-crafted evaluation function. (e.g. prefer gold, prefer taking towers, each given some arbitrary value)


maybe the team never picked lineup it knew were bad, and as such never trained with them


Very impressed with the bots. The main strategy I see from the bots that is different from pro meta is how they spam abilities aggressively. Now the main limitation is 5 unkillable couriers which enable this. Watching, I do feel like given more practice humans would beat this version of the ai easily. It looks exploitable, and still having many rules limitations it has a ways to go. It will win in the end though, that much has become obvious.


The bots also have perfect knowledge of enemy hp/mana and the relative positioning through the API. In fact the bots are gifted perfect micromanagement through the API. This enables them to do things most humeans wouldn't try, because the risk of messing up and giving the enemy a huge advantage is too high.


The API does not give the bots extra information compared to a human player. The micro- and reaction time edge is also being dulled to being more human-like. They still have superhuman team fight execution though.

"We’ve increased the reaction time of OpenAI Five from 80ms to 200ms. This reaction time is much closer to human level, though we haven’t seen evidence of changes in gameplay as OpenAI Five’s strength comes more from teamwork and coordination than reflexes."


Part of the advantage in teamwork and coordination is presumably that they don't have a limitation on what data they can view at once?

Dota & HON had people mod their client to give an optional bigger FOV resulting in bans for cheating.

I'd assume the bots don't have to specify their screen position, plus no orientation response means a limitation on this wouldn't be meaningful anyway. What I'm saying is there's a big difference between 'I see lion on the mini-map down there' and 'lion showed for 1 frame on the other side of the map, his HP is 324, he has a TP scroll, no boots and a health pot.'

Something I noticed is the bots seem to like range and AoE far more than the normal human meta. The humans being limited in the distance they can see to one screen were frequently just failing to appreciate how dangerous 2-3 bots half a screen away were to them.

Quite a few teamfight wins came from the bots inevitably causing far more damage to the entire enemy team via heros like DP & Gyro. But this isn't really perfect teamfight execution. I'd have really liked to see a mirror match.


Correct, the bots "see" the entire map. Well, the parts that are not hidden by 'fog of war'.


> The API does not give the bots extra information compared to a human player.

Well, maybe not technically, but it does make it much much easier to take in all of that info and process it. You can't expect a human player to keep a perfect record of all heroes' hp, mana, all damage being dealt, all abilities being used etc. during a chaotic fight, yet the API yields this information effortlessly.

> The micro- and reaction time edge is also being dulled to being more human-like.

And yet, the bots showed superhuman near instant reaction times. 200ms is a very low amount to process very complex/confusing audiovisual data and react precisely.


I think you're just kind of hitting on the real difference between human and bots anyway.

Regardless of if the information comes from the machine viewing the damage count and knowing exactly how much HP a given hero has at that level/gear/just by looking at the bar, or if the information comes from an API, the machine has a perfect memory of this and all other variables, whereas humans don't.


It would be really interesting to see what bots will do once humans will try to exploit this strategy, like adapting to the lane movement they always do after winning a lane etc, humans did that once today but it may be used better. Excited for the TI to see less limitation, a better shop system (lot of money throw away) and maybe better ward.


The next AI will always be much better (see AlphaGo progression).

Humans, not so much, as in all top-level competitions, human abilities improve minimally at the top, because we have millions of humans competing against each other until the plateau of human performance is reached. Then you can push that a bit more with drugs (see doping in sports). And after that, you are pretty much done.

So it's only a matter of time and effort until AIs are fully unbeatable.


> Humans, not so much, as in all top-level competitions, human abilities improve minimally at the top

That's just not true in doto. The player base improves quite a bit over time. The top pro plays from only a few years ago are not impressive anymore.


I agree that AIs will eventually win. But I only consider human beings beaten when the AI is interfacing with the machine the same way as humans -- looking at the monitor, and inputting commands through keyboard and mouse. That is on a different level than just hooking into a log of events and calculating your next move. Given enough time humans would do that better than the AI imo


Come on, do you think hooking a robot to the keyboard is the hard part? Do you want 5 fingers too?

And if watching the screen, do you want it to have bad eyes like we do too (good resolution only in the center)?


The relevant question is surely how much of the game is strategy, and how much is reaction time and twitch motor skills.

I mean, "human vs AI" matchups are ostensibly about strategy - machines already win at timing and twitch, there's nothing to test. But esports games aren't pure strategy, they all involve various amounts of timing, twitch skills, the ability to monitor lots of details at once, etc. Those are all things that AI opponents can (trivially) do perfectly, which gives the AI a huge advantage. It then follows that an AI player should be able to win even with an inferior strategy (which makes you wonder if these games are really suited to AI research in the first place?).


Dota is a bit different from other esports such as SC2, in that it leans much more on game-sense and decision making than twitch skills[1]. Nevertheless, OpenAI dulled the reaction time artificially to be more human-like. It makes sense if the goal is to make a sophisticated strategic/gamesense AI, and not one that wins by just executing better.

[1] For an entertaining case-study, check out Day[9]'s learns DotA2 series.


Oh, that's interesting about the reaction time. Not having played DotA I have no idea how big an issue it is (and I couldn't make heads or tails of the video :D ).

It occurs to me that for a really even playing field, the humans should probably be allowed to make and install UI mods if they want to. E.g. if there's an advantage to using an ability precisely when your hit points hit 50% (or whatever), an AI can easily do that reliably so the human should probably be able to if they want to.

(Of course, for heavily twitch games like Counterstrike, being allowed to use UI mods (i.e. aimbots) would break things. But then, I suppose that the extent to which UI mods break a game is more or less the extent to which that game favors twitch over strategy.)


Good point, and a hot topic over the years in the DotA community. The developer introduced some UI changes over the years that do exactly that: show every bit of information that an expert human or AI could realistically figure out and exploit. For some, it's dumbing down the game for a new audience, nullifying their hard work. I think the community consensus now is in favor: the game is complex enough that it does not need arbitrary skill-differentiation mechanics. For instance, high-level players used to memorize and practice the duration of some abilities (stuns), because you have an advantage if you can chain these perfectly. Valve introduced a visual progress bar that showed how much the disabled effect on a hero lasts, making it much easier to chain disables. They now also show how far 'towers' can shoot, but also which entity it is currently targeting (tower 'aggro' mechanics has its own special logic). They go so far as to show spawn boxes for neutral monsters.


Yea, I think having a robot use fingers to manipulate the keyboard like a human is a very hard part


Mouse eye coordination would probably be harder.


Have you seen the robots that place components on circuit boards? I don't think that hitting keys on a keyboard is more difficult.


Come on, do you think hooking a robot to the keyboard is the hard part? Do you want 5 fingers too?

Yes, it is. Also, “seeing” the screen rather than being able to directly introspect the game world digitally. Orders of magnitude harder. This is known as Moravec’s Paradox.


Yes it is difficult. Under current conditions AI has access to much more data than humans. So it is unfair battle, it would be fair comparison if AI was programmed in such a way to have more of an analogue input of data similar to human eyes looking at a monitor and also analogue input of actions similar to human pressing keyboard keys which is much more inefficient as AI just calling internal functions to execute actions.


Moravec's paradox!


Yes. Right now the AI has more information than the humans.


I mean, interacting with the mouse and keyboard isn't the interesting part.

Like, imagine if this was a chess AI, and we were trying to determine who was better at chess, humans or AI. Would you make the AI use robotic hands to move the pieces? No, because thats not the interesting part of chess. The interesting part of chess is the strategy.


There is 0 mechanical skill involved in chess. Reflexes don't matter. There is mechanical skill involved in Dota 2, and reflexes matter.

And consider a game like Quake where mechanical skill is even more important (even though mechanical skill matters, Dota 2 is still primarily a game about strategy and team coordination).


I would encourage you to watch this APM demonstration https://www.youtube.com/watch?v=YbpCLqryN-Q , give https://www.engadget.com/2014/10/24/starcraft-2-and-the-ques... a read (600 APM) and consider how much of an advantage the computer has.


APM matters less in Dota than in Starcraft


Agreed that the strategy is an interesting part.

Another interesting part will be creating an AI / neural network that can utilize inputs that are closer to human level inputs (e.g., using the frame buffer and audio out as input to the neural network and passing the outputs of the neural network to a keyboard and mouse driver). Just let the network train itself without having a human laboriously determine the topology of the neural network. Such a neural network can then be applied to several different types of games / problems much more quickly than at present where significant human labor is required to generate deeply customized neural networks for each game / problem.


The main reasons they don't do this are that it's a fairly known quantity from an ML perspective (going from sequences of images to representational features), so wouldn't be proving that much to be able to do (c.f. the various Atari benchmarks which adequately learned actions to achieve rewards working with pixel inputs)... but at the same time would consume a huge fraction of the computer resource they really want to be targeting at the core timing/tactics/strategy problems... which is where they're really going beyond what's been demonstrated elsewhere with RL.

I agree it'll be even cooler when it all justworkstm end to end, but in terms of incremental 'holyshiticantbelievethatworked' this is at least as big a step as it will be when they add in direct visual input.


Agreed.

One of the next significant moments could be taking the current Dota 2 algorithm and massaging it to use human style inputs and outputs. Please correct if needed, but the current Dota 2 algorithm boils down to (1) a fully connected network that generates an input state vector from the Dota 2 bot output interface, (2) an LSTM of sufficient length that generates an output state vector from the input state vector, and (3) another fully connected network that generates the Dota 2 bot interface inputs from the output state vector. This could be updated to have (1a) a convolutional network that feeds into a fully connected network, where the input to the convolutional network is the frame buffer (and perhaps the audio output) and the output of the fully connected network is the input state vector, (2) the same or similar LSTM network, and (3a) a fully connected network that outputs keyboard and mouse commands instead of DotA 2 bot interface inputs.

It is an open question as to whether current compute power is sufficient for this massage.


>Just let the network train itself without having a human laboriously determine the topology of the neural network

I hope this little koan illustrates that this sentence is impossible to execute. The human always has to specify something.

--

In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6.

"What are you doing?", asked Minsky.

"I am training a randomly wired neural net to play Tic-tac-toe", Sussman replied.

"Why is the net wired randomly?", asked Minsky.

"I do not want it to have any preconceptions of how to play", Sussman said.

Minsky then shut his eyes.

"Why do you close your eyes?" Sussman asked his teacher.

"So that the room will be empty."

At that moment, Sussman was enlightened.


Agreed that without any constraints, it could become a Sisyphean task.

The exercise then becomes one of finding the minimal constraints needed to achieve the desired results. Please correct if needed, but looking at the Dota 2 neural network [1], it boils down to generating an input State vector from the Dota 2 bot output interface, running the state Vector through an lstm (of sufficient length) to generate an output State vector, and generating the inputs for the Dota 2 bot input interface from the output State vector. Update this network (1) to have the input State Vector generated from a convolutional network that feeds a fully connected Network and uses the frame buffer as input and (2) to have the final outputs of the neural network be keyboard and mouse commands instead of dota 2 bot input interface commands, then let the network train itself. The number of elements in the state vector, the number of convolutional layers, the number of lstm layers, and the number of layers and elements in each fully connected hidden layer could each also be determined by a recurrent neural network.

[1] https://towardsdatascience.com/the-science-behind-openai-fiv... (see the image under "The Architecture")

[ random capitalization powered by Google speech dictation ]


Interacting with the mouse and keyboard at this speed and precision is absolutely a very interesting part for me. It is a limitation of he human organism.

What if we could pit humans and ai bots at the speed of human imagination?


It could be interesting, but playing DotA just isn't the best problem to test this problem on.

One could imagine a much better way of testing hand eye coordination, through a serious of mazes or puzzles or reaction tests.

It would be like trying to test hand key coordination by having a robot play physical chess against a person.


Humans tend to progress by learning from each other. The AI will teach us.


Whether the AI teaches us or not, once it surpasses us, we will not catch up. This is the case for every game AI that has ever surpassed human performance so far, and there's no reason to expect that this will be different in the case of DotA.


> game AI that has ever surpassed human performance so far

Am I missing something, or does that set consist of Checkers, Chess, and Go so far? (presumably with analogous misc games of comparable complexity)

Discounting the reaction time wins, I'd say the sample size is too limited to generalize to eventual AI behavior in more complex / open-ended games.

Extrapolation was the cause of the last AI winter.


Poker. Although Heads Up Limit Texas Hold 'Em has been essentially solved the No Limit variant (in which you actually decide how much to bet) is very far from solved. An AI beat the best players in the world anyway. It plays crazy, but it plays crazy in a way they weren't able to exploit at all. Doug Polk (also one of the best in the world in this narrow specialty) did some funny videos but the humans got crushed.

So that's a pretty different game, it's got a big luck factor and has asymmetrical information and still the AI just kept getting better and the humans... didn't


Statistics is not everything. Which hypothesis can lead to the outcome "AIs will not be able to beat humans in every game"?

I can see following hypotheses (in no particular order):

1. Human brain is the optimal solution in the space of all computational devices capable of playing games, and we can only approach it.

2. To do computation human brain employs some physical processes, we will not be able to replicate in the foreseeable future.

3. Human brain do not produce general intelligence, so we will not be able to replicate it as such a task is outside of the scope of our limited intelligence (while playing games isn't).

4. Human brain uses metaphysical abilities to do cognition, we will not be to replicate them at all.

5. Human brain is a local optimum, but the space of potential AIs' constructions is too huge to explore in the lifetime of our civilization, so we will be stuck at this local optimum with marginal improvements.

I don't see any of them as sufficiently likely, but your mileage may vary.


I'd point to a combination of (2) and (5) as the maximally likely reason we'd fail to build a generalized game playing system.

I believe there exists a combination of hardware and software capable of beating humans in all games. However, I also believe victory in a single game gives us minimal information on whether or not the system generalizes to many games (to say nothing of non-game, e.g. more complex, ruleless systems).


I think you're looking at things with a sort of hindsight bias. Victory at chess was at one time considered to be the indicator of the emergence of true 'intelligence' in computing. The reason is that it's an extremely open, creative, and strategic game spattered with a minefield of tactical nuance. Nobody, human or computer, is getting even remotely close to scratching the depth of the game from a numeric point of view. There are some specifics on the numeric complexity of the game here [1].

The reason I point out the unfathomable numeric complexity is that it makes the games, from the perspective of an AI, effectively infinite. AIs are calculating, but to an extremely superficial degree relative to the depth of the game. E.g. - when a chess program says it's calculated to 30 ply (15 moves for both sides) what it really says is that it's seen up to 15 moves deep after intentionally ignoring or pruning 99.9999999999% of moves which it thinks probably aren't good -- something it still often gets wrong, but its 'understanding' of what is 'not wrong' is strong enough that it still results in a phenomenally strong level of play, compared to humans. There's no doubt that perfect play in chess would still go 1 billion - 0 against something like AlphaZero.

So what matters is not the number of decisions to be made but the individual complexity of the decisions to be made. And in most games we consider complex the individual decisions are not really that complex, and complex systems can often be broken down into very simple games. For instance a great example of this is a 4x game. Taken as a whole they seem complex, but they're really just a large number of relatively simple components that are mostly independent. E.g. - Given this state, where do you explore next? Given this state, what do you research next? Etc. Another benefit for AIs in that in games we consider more complex, the value of any given mistake often becomes diminished. If you make a single bad move in chess, it's enough to lose the game. In a 4x game the weight of individual decisions is not so high, it's all about the big picture. But as perhaps computer success in Go shows most clearly, actually seeing the big picture is not really necessary to produce play like you do.

This, I think, is why research has moved more onto real time competitive games. Crushing humans at chess, go, and now poker as well is a pretty solid proof of concept for computers beating humans at any turn based game. When you start adding bunches of different layers to games I think it's more likely to handicap the human than the computer. Imagine playing some sort of 100x100 chess. We can only speculate, but I imagine the distance between the top AIs and humans would be far greater than it is in 8x8 chess.

[1] - https://en.wikipedia.org/wiki/Shannon_number


> Victory at chess was at one time considered to be the indicator of the emergence of true 'intelligence' in computing.

I would disagree with this characterization. I believe at the time, it was (a) a problem that a machine had not yet conquered, (b) a problem that it seemed feasible that a machine might conquer, and (c) a problem that, once conquered, would point the way to general artificial intelligence.

I would point at (c) as the assumption that proved to be erroneous. Deep Blue was clever algorithmic and hardware engineering (with a healthy budget) but led to... what?

AlphaGo is a fundamentally different approach, which shows signs of being more adaptable.

Point being, that winning a game is not sufficient evidence that a given approach will scale to winning all games, much less generalized intelligence.

To put it in terms of the fallacy I read in an article linked on HN (paraphrased), 'The public assumes that if a machine can perform a task that humans can perform, the machine must be human-like, and therefore able to perform all tasks that humans can perform.'

But in the same way that we use rendering tricks to go beyond-state-of-hardware-art in graphics rendering (by abusing hidden limitations), so do we often build ml systems.

I believe the most optimistic point against me was the slide in this year's GTC keynote pointing to the "Cambrian explosion" in the diversity of ml approaches this time around.


Depends how you define, "surpass."

There will be a time where we learn from the AI and the AI learns from us, where we trade victories and defeats as we adapt to each other.

Don't discount the ability of humans. They figured out how to exploit the 1v1 bot in a few days and soon humans had a 100% win rate using that strategy.


>given more practice humans would beat this version of the ai

given more practice bots would beat humans. that's the point, train bots, which are faster to train than humans to beat humans.


It's important to keep in mind the exact quantity of "more practice". Current mechanisms of reinforcement learning are not very data-efficient, which means that often humans will learn faster than bots. It will still allow bots to discover any unrealistic advantage they have over humans (e.g. faster micromanagement), but if the game is fair and experience to learn from is limited, humans may still prevail.

To "beat" Atari games, AIs trained using reinforcement learning had to put in significantly more than the 10k hours one would expect a human to put in in order to become expert. So, AIs won't beat humans in tasks where training data is costly; however these cases are not interesting to researchers and hence you won't hear from the respective results.


To follow this thought, it's also worth pointing out that AIs also have an advantage of parallelization over humans in many cases. What might take an AI 100k hours can often be achieved in ~10k hours in parallel across 10 machines. This is what enables the current system to train over 180 years worth of games every day.


What the cases so far tell us: Once an AI beats humans who have spent 10,000 hours practicing a skill, it is a matter of time before it beats the best professionals in the field.

Cases it already happened: board games such as Chess and Go, Poker, diagnostics of certain diseases using medical images

Cases where AI is still clearly inferior: video understanding, natural language understanding, motor control esp of hands and legs, general medicine, driving

Hard-to-classified cases (AI is better for some instances, worse for others): image tagging and classification, speech recognition (speech-to-text), diagnostics of certain other diseases using medical images (which might need to take into account other information outside of images)

More examples esp counter examples are welcome.


> Poker

This is a misnomer.

The only variant of Poker where AI beats humans is heads-up (two-player) variant, which is simplest form of poker and also rarely played. The AI was (marginally) beating humans there by playing a game-theory-optimal strategy. For poker games with 3+ players, the GTO strategy (Nash equilibrium) no longer exists, so AIs need to use more standard techniques (search-based, reinforcement learning etc.), which are, at the current state of the art, laughably weak at poker.

Not to mention, that in poker the actual hierarchy of players' skill is not 100% obvious. You could distinguish at least two areas of skills:

- play vs other experts

- play vs amateurs/weaker players. Here' the goal is not to come out ahead (which, in long term, is a given), but to _maximize_ the dollar amount taken from these players, which is a skill in itself.


Thanks for the clarification. I do not know much about poker.

The observation applies to a given variant of poker (or any other domain). So if an AI beats humans with 10000-hour experience in that variant, the best experts in that specific variant are not far-off targets.

Superficially similar problems might in fact require very different techniques to solve as your example illustrates.


Modern AI may "beat" humans at certain tasks, but I doubt they are as efficient as humans are at those tasks: it makes sense that throwing more compute at problem will yield better results. But is it possible to get better results while constraining training time?

Edit: I looked up how much time it takes to train: "OpenAI Five plays 180 years worth of games against itself every day, learning via self-play." [1]

[1] https://blog.openai.com/openai-five/


If you can get hundred years experience in 24 hours by wall clock, why not use it? Maybe it's better to teach AIs to create simplified, but usable models, they can use to simulate and train on.


I view the problem in terms of "computational complexity": if the AI of today requires O(2^n) time to perform at a human level, it is possible to get that down to O(n^2)?

I believe there's a connotation with problems, that if the best algorithm to solve it is exponential (brute-force search), then we truly don't understand the (underlying structure of the) problem.


I don't think it is a fair comparison. We have the ability to build somewhat working strategy from known parts, and then we still need thousands of hours (or man-years in the case of Go) to improve it and to teach the brain to do some parts without conscious attention. Alpha Zero, OpenAI Five and others build strategy (mostly) from scratch.

I have a feeling that it is not possible to reduce algorithmic complexity of finding optimal solutions in most of the intellectual tasks (those that are in NP complexity class and above).

Most likely it is a trade-off. Quickly cobble up suboptimal strategy / build better strategy from scratch avoiding all time-saving benefits of using known parts, and avoiding all the pitfalls of not reevaluating utility of those parts in the current situation.

AIs surely will need to use all the spectrum to compete with humans.


OpenAI is far from beating "the best professionals" of dota in a full dota match, though.


Yes. My prediction from the observation above, which agrees with predictions from many others, is that it is a matter of time before that happens (likely within 1-2 years if they continue to work on it).


I actually have an intuition around using modular neural networks with dynamic topology to tackle more complex disease cases, general medicine, and other complex, hierarchical problems. I'm working towards trying to use them in my thesis for school


If anyone has any reading suggestions or research I should look into, I'm all ears


A good thing to read to know the basics of the game and what to pay attention to: http://smerity.com/articles/2018/n_things_to_look_out_for_in...


I watched the game against audience members, and the bots seemed to overextend at times. Also weird courier scouting glitches (which makes sense if all couriers are currently invulnerable for the purpose of this generation of bots). Another funny thing was watching them use smokes for no apparent reason.


They overextended in the enemy safelane. Maybe killing the enemy carry is #1 on their priority list :D


Yeah, I was thinking more in the lines of the aftermath, after diving – getting stuck in the forest behind the enemy tower. But you are correct, they seemed hella determined to get that Slark kill!


haha yes I saw that, it seems forest navigation is a strength though, tho sometimes they dont search


Ward usage also seemed strange. At some point the bots put a sentry ward inside the top enemy tower's range - not sure if they gained anything from it at all.


I think the idea there was to take tower hits so they can keep pushing without taking hero damage. And delaying until the next creep wave arrives. Not sure it had much effect in this game, but it can be situationally useful.


This is indescribably exciting. Can't wait to see OpenAI Five tear through the human players just like what the single bot did to 'Dendi' -- a professional gamer -- during the International 7 tournament in August last year. [0]

[0] https://www.youtube.com/watch?v=wiOopO9jTZw


When you post that you should also add how fast players were able to find a strategy that the bot could not understand and make it lose the game.


It's not that exciting and it's very easy to beat humans at games when your reaction times are far faster and you have access to more information at a single moment in time. It's like being impressed by a calculator.


"We’ve increased the reaction time of OpenAI Five from 80ms to 200ms. This reaction time is much closer to human level, though we haven’t seen evidence of changes in gameplay as OpenAI Five’s strength comes more from teamwork and coordination than reflexes."


Notably a lot of the clutch plays in Game #1 by OpenAI were due to very well timed skills.


Dota 2 player here, they "timed" it well because human players were in the vision. Humans can also do that by precasting a spell.


Reaction times are limited to 200ms; it's all about the meta and the group tactics, and bot are apparently really good at it.


The bots have perfect information (when the enemy is in vision) and perfect micromanagement through the API. A huge part of the challenge in dota is processing the viuals and audio cues from the game quickly and acting the limited information you have precisely. The bots get to take a shortcut around all of that.


Well yes and no. Sure, micro is a large part of Dota, but planning and coordination is worth as much. So reaction time gives you an advantage in one spectrum, but I think the real benchmark will be macro scale planning and coordination. Same applies to the Starcraft AI competition, where the bots still lose due to macro planning.


dota 2 is not as extreme about this as, say, starcraft. in starcraft, most "strategies" are heavily scripted build orders that almost never deviate from a handful of openings (similar to opening moves in chess) and a large amount of winning comes from the ability to quickly give orders to your troops ("micro"). This is an example of a strategy that is not viable without superhuman reactions: https://www.youtube.com/watch?v=IKVFZ28ybQs if you directly attack 20 siege tanks with 100 zerglings you will only kill about two siege tanks, but an AI can kill all the siege tanks with some zerglings left over.

There's some of this in dota, but there's a cap on the skill level for most playable characters that pros generally get "close enough" to, and beyond that the strategic depth comes from area control decisionmaking. Theres over 100 heroes and many of them have really weird abilities, like the possibility of creating a temporary wall (earthshaker) or the ability to teleport anywhere on the map every 20 seconds (furion). I could be wrong though, maybe the AI is winning games by playing heroes with long range and perfectly microing them to harass and prevent the other team from ever getting gold/xp.


> in starcraft, most "strategies" are heavily scripted build orders that almost never deviate from a handful of openings (similar to opening moves in chess) and a large amount of winning comes from the ability to quickly give orders to your troops ("micro")

As somebody who plays StarCraft casually (gold/low plat in ladder), this is not true. It's even less true for pro players. The level of strategy in StarCraft is impressive, it's really hard to guess in which direction games will go when two very good players are playing against each other.

Sure, perfect execution when it comes to one strategy (say, mech-heavy Terran) will give you the largest advantage against your opponent, but failing to scout appropriately and guess what your opponent is up to means your strategy is dead. You also have to decide when to attack, how much you're willing to sacrifice to damage somebody's economy, when you want to focus one economy vs building units, ...

The video you sent with zerglings is a gimmick made for fun (it's a hard-coded AI using the siege tank's aim logic to divert zerglings from that). That would not win you a game. (because most likely a pro Terran would have destroyed your base before that)


I wonder how far you'd get with a bot that macros perfectly but also A-moves 2-3 groups.


What does "macro perfectly" mean? If it does the same strategy over and over, you just scout, find its strategy, and go for the counter. Its macro will be useless if it has the wrong type of unit.

In a way, the built-in AIs "macro perfectly", but they are terrible at strategy and fighting (because even fights are not just a matter of gimmicks, you need to split units in a special way, send diversions, attack at the same time from multiple fronts, etc.)


I think we made the same point at the same time :D


What information do they have access to?

I do think a simulated reaction time or limited actions per second might be more fair, if they don’t have that already..


They have an artificial restriction of 200ms for their reaction time, and their actions per second is similar to human players. The main information the bots have that players don't is exact coordinates of everything they can see, and the amount of damage attacks and spells do. The bots know with 100 percent certainty that a spell or attack is in range before they use it, whereas the human players would have to base it off visuals, or press another button to bring up their spell's range indicators. Having the damage of attacks memorized allows the bots to know if an incoming projectile will kill them, that allows them to not waste healing resources if they know they'll survive. The bots also move based on coordinates, so misclicking and mouse travel time aren't an issue for them.


Only one LSTM per bot so there is no "master strategy" neural network. Interesting, if I'm not wrong, bots are learning independently.


During training they were given access to each other’s reward functions, and the extent to which they weighted total team reward over their own was gradually increased.


Expecting 3-0 slam dunk here. Consistently surprised at the ability of tried and true basic reinforcement learning at completing challenging tasks. I totally expected some kind of neccessary breakthrough in the RL field before a real time game like Dota 2 could be beaten.


I would love to know why the third match turned sour. I suspect (as an amateur with no ML background) that that matchup was under-trained.

Like I could imagine OpenAI getting stuck in a subset of the draft pool for which it trained against, like maybe the top 10 of 18 champs. And then picking outside of that meta causes it to fall back on much less robust training/strategy.


Because the first two matches were so lopsided, the bot lineup was selected by twitch chat + audience members. We drafted them a pretty terrible lineup, and from the start the bots estimated their chance of winning to be about 2.9%.


Indeed. And to explain further: not all hero combinations are equal. Meaning: you cannot select any arbitrary set of 5 heroes and expect them to perform well. Different heroes have different strengths and synergies that make them stronger or weaker depending on the specific teammates and opponents that are present. This is why drafting is considered such an important (and difficult) portion of the game. In match 3, a purposefully-bad team was selected. It would have been VERY impressive if it was able to win.


> Meaning: you cannot select any arbitrary set of 5 heroes and expect them to perform well.

When I think of AI, I think of something crawling its way out of purposefully adversarial situations such as this one. I would have loved to see optimal play from 5 wacky heroes.

I just have this suspicion that that wasn't optimal for that team comp.

But of course the matchup itself is a thing.


This still isn't a real game of doto, tho


If there are any OpenAI folks hanging out here, I'd be really curious to hear about the (apparent) tactical pause by the AI in game 2.


It wasn't a bot, it was an observer. A bot would have it's name appear.


I'd speculate it's the element of surprise.

Ideally the pause would do nothing to the game. (It could be that the game has some glitches but let's assume it's solid.) Then one could assume that the pause does nothing regarding the AI players. But that is not true because the AI itself is not paused (I assume) and it has to handle this unusual input. This might throw off some versions of it and it probably learns how to handle the pause and how to use it to gain a small advantage. So it's all part of the game. Similar to when a human team uses the pause for tactical purposes.


My buddy who is a pro is going to ask this during the Q&A



My thoughts on the AI's performance so far knowing a bit about both DOTA and machine learning:

- Item-usage for things such as smoke and wards (which were recently added to their reportoire) are not well captured by the bots yet. And the buying of wards were confirmed as a scripted event. It seems hard for them to capture the long term sparse reward of these. Smoke might not be needed by a perfect agent, but wards should be. The developer interview noted that it's not very clear what the reward for an agent warding even should be.

- Some of the big advantages OpenAI gains are in team fights, where it's 200ms reaction time (upped from 80ms to resemble human reaction time more) still strikes me as something that tilts it into a solid mechanical advantage. On several occasions OpenAI's Lion managed to disable a human player performing (what I assume to be) a move which shouldn't be interruptable. (blink-->shift+ctrl ultimate ability on earthshaker) This could have tilted teamfights in the human team's favor a few times if it wasn't stopped by the "machine-like" reflexes of OpenAI.

- Positioning before team fights by OpenAI are scary. There are very few openings, and every individual agent is protected by its team.

- Likewise is the map movement scary most of the time. Being able to recall just enough agents back to defend while simultaneously taking out strategic objectives of the human team. Also Blitz(caster) has noted earlier OpenAI's ability to focus on the winnable lanes and sacrifice the others, prioritizing well. (and exploiting map mechanics unknown to most pros until a few years ago)

- When the AI takes down a tier 1 tower, they seem to be very quick to take down the remaining tier 1 towers, instantly capitalizing on their map control advantage, and expanding it.

Some interesting things / bugs:

- Sniper bot throwing multiple spells on the same location right away (even though the damage doesn't stack) effectively simply wasting his mana and cooldowns, for no gain.

- Sniper using his ultimate ability to pressure the lower hp characters of the human team continously. Usually it's more often used more as a finisher, and might be an artifact that appears from the AI having access to an unkillable courier that ferries a lot of healing/mana regenerating items.

Some additional info learned from the interview of some of the devs:

- Incentivizing killing Roshan (a boss character in the middle of the map, which yields a one-time ressurection item for one player, after being killed) is done by varying roshan's HP down to a really low amount, making sure the AI experiences the upside to this. Otherwise it would require all 5 agents gathering there, expending their magical abilities and investing a lot before actually seeing a reward. (which is unlikely to happen)

- Game length in self-play sessions are above 60 minutes around 1% of the time.


> Lion managed to disable a human player performing (what I assume to be) a move which shouldn't be interruptable

Fogged (the human player) commented on this and said he messed up. If he had shift-queued the spell, or just used it immediately after blink, it would've landed [0].

Cancelling an initiation with instant spells (Lion hex, Rubick lift, etc) does happen frequently in high level human play as well, where you continuously pre-cast the spell, cancel, walk back slightly, repeat, on the out-of-range initiating enemy, to have the spell interrupt the initiation as soon as the initiating enemy blinks in to range. I do agree that the bots have a solid mechanical advantage, just pointing out that this specific scenario does frequently happen in human play as well (albeit not on every single initiation).

[0] https://www.reddit.com/r/DotA2/comments/94vdpm/openai_hex_wa...


Otherwise it would require all 5 agents gathering there, expending their magical abilities and investing a lot before actually seeing a reward. (which is unlikely to happen)

This is the key - if we had machine learning techniques that allowed it to reason at the level of "what happens if I kill this?", we could explore more of the interesting state space more quickly. Perhaps there are advances in intrinsic reward systems that allow this.


Congrats OpenAI on game #1


And game #2!


hi,does anyone know if openAI release an up to date state of learning for their open AI dota code?

Basically can I run the same sim on my laptop and watch them play? I can see some code on Github but dont know if the actually neural net data is available too.

If anyone knows the answer that will be great thanks


Even if they did release the likely-very-big model, your personal computer is likely not fast enough to make 2-3 actions per second (x5) and update weights in real time without a beefy GPU.


You don't need to update any weights, the model is already trained.


Ignorant here. How do you take an already trained model and execute it elsewhere? Isn’t the training phase part of the whole (ongoing) simulation?


The training phase is when you are trying to minimise your loss function by trying different weights each iteration.

Once that is finished, you have a trained model which you can use by providing input and getting an output (with the weights frozen).

The training phase is very expensive computationally because you have to calculate the gradient of your loss function on potentially huge tensors.

The execution phase is not that expensive and commercial laptops will be likely able to run the model without any problems.


My initial read of https://blog.openai.com/openai-five/ was that the 1024-cell LSTM game state layer was trained on the fly, but after rereading it may not be the case.


I think the model itself could probably run on a PC with 64gb of RAM/VRAM (Well okay, it would be an unorthodox personal computer, I guess my point is, you could probably fit it in a Single computer, not one you have at home though). Training it definitely not possible without a large cluster.


Thank you for the reply minimaxir - are you sure a 2015 macbook pro cannot run a compiled version of the neural net so it can play itself on my computer?

Just wanted to double check - can someone else verify this before I spend weeks seeing if I can get it to work?


I'm pretty sure that a macbook pro could run the 5 agents using minimal GPU acceleration (eg WebGL)

A 1024 unit LSTM only takes up a few megabytes of memory, and the multiplications at runtime are O(N^2) and not O(N^2 M), because you don't have a minibatch of updates to run.


Last time I looked, I did not find any published trained nets on it. As far as I know, the custom build of Dota they use isn't public either. It will most likely change in the near future though, would be the next logical step for the OpenAI team as per the goal of their organization.


Chat deciding heroes can be so cool! We may see new strategy, AI out of the safe lane confidence!!


clicked on the stream, title should be changed to "getting destroyed by Humans"


Let's just ignore the first two 25 minute games in favour of AI


And that the AI was handicapped in the game it lost, as they let the audience draft for the AI team and of course they deliberately picked a bad lineup to see what would happen.


the reverse happened


First match is over, human pros got wrecked, and very fast.


They were audience members, not pros.


The marketing on this whole effort is strong and the training budget was insane. That said until it defeats the best of the best it's just really amazing machine learning as opposed to game changing machine learning IMO. Also keep in mind that there's a game state API here, this is not at all like learning 2600 games from learning the screens alone which would make me fear we were on the verge of the robot apocalypse personally.


Pro game starting in a few minutes.


AI winning due to super human abilities is about as impressive as a Counter Strike bot with perfect aim making headshots every time.


Check the DeepMind Quake bot: https://deepmind.com/blog/capture-the-flag/

Even when reducing tagging accuracy to below human level, they still performed better.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: