Pokemon Leaf Green Slot Machine Trick – It’s no secret that we love Pokemon here at We shared our memories of Pokemon Red and Blue, drooled over 3D sprites of the original 151 Pokemon, and even shared a neat Easter egg secret in Pokemon Sword and Shield.

Speaking of Easter Eggs, there’s an almost forgotten secret hidden in the original Pokemon Red and Blue (and Green and Yellow, I guess) games. It’s in Seldon’s play corner.

Celadon City is located in central Kanto and remains a favorite of Pokemon fans playing the original games because it felt just huge at the time. Time had eroded that illusion, but compared to Pallet Town’s diminutive size, Seldon really did feel like a huge city to get lost in.

Aside from its perceived size, Seldon City was special for two reasons: Seldon Department Store and Seldon’s Playground. Sure, there was also the headquarters of Celadon Condominiums, the Celadon Hotel, and the city’s gym leader, Erica, but all we wanted to do that day was get our coins out and move on. After buying some Pokeballs and Potions, it was off to Celadon’s gaming corner to spend some money on the slot machines, long before I found online slots. and lose. Again and again. No deposit casino bonuses for me.

Well, that is if you didn’t know what you were doing. You see, there was a neat trick to get huge prizes in Seldon’s gaming corner. Officially, in the first generation games, there was no advantage regardless of which machine you used. All slot machines had their odds generated randomly in each game, with no slot machine being good to every player all the time.

When you first enter, you’ll notice thirty slot machines, with eight of them in use, one where someone left their keys, one labeled “not working” and one marked “saved”. Of all the slot machines, the best is the machine in the lower left corner with the ID “1”. The game assigns a lucky ID when you first enter the building, which ranges from 0 to 31, but there are 35 machines. This means that machines 31 to 35 have higher odds – these are located at the bottom of the left column.

In a game that makes it hard to progress in the early stages, Seldon City’s gaming corner really gave a generation of Pokemon fans like me the chance to “game the system” and get the money to buy the potions and Pokeballs I needed to get through the next few hours of play. This remains one of my favorite Pokemon mysteries of all time.

What is your favorite Pokemon secret? And don’t mention Mio under the truck. I won’t fall for it again!

This site uses Google Analytics to collect anonymous information such as the number of visitors to the site and the most popular pages. Reinforcement learning provides a huge boost to many applications, particularly in e-commerce for investigating and anticipating customer behavior, including where I work as a data scientist, Wayfair. One popular way to model problems for RL algorithms is as a “multi-armed bandit”, but I’ve always thought the term was unnecessarily harsh, given that it’s supposed to be a helpful metaphor. First of all “one-armed bandit” is 100-year-old slang, and secondly, the image of a slot machine with multiple arms to pull is a strange one.

Modern slot machines probably have different buttons to press, which at least pretend to give different odds, but a better metaphor would be a number of machines in a casino, some “loose” and some “tight”. When I walked into the gaming corner of Celadon City, in the 2004 Gameboy Advance game Pokémon FireRed, and saw rows of slot machines all with different odds, I knew I had found the ideal “real life” version of this metaphor—and a practical application of reinforcement learning.

Celadon’s Play Corner: A Pit of Evil, Curses and Lost Souls. (screenshot by author which is fair use based on teaching, scholarship and research)

And I mean practical! Otherwise how am I going to win 4000 coins to buy the Ice Beam or Flamethrower abilities, which I will need to fight the Elite Four??

I built a reinforcement learning agent, using Thompson sampling, to tell me which machine to sample next, and ultimately, which one to play the hell out of. I call it MACHAMP: Multi-Armed Coin Holdings Amplifier Made for Pokemon.

Given a set of possible actions (the “arms” of a multi-armed bandit – in this case different machines to try), Thompson sampling optimally trades off search versus exploitation to find the best action, by trying the most promising actions more often, thus obtaining a detailed estimate more of their reward probabilities. However, it still randomly suggests the others from time to time, in case one of them turns out to be the best after all. At each step, the system’s knowledge, in the form of posterior probability distributions, is updated using Bayesian logic. The simplest version of the one-armed robber problem involves Bernoulli trials, where there are only two possible outcomes, reward or no reward, and we try to determine which action has the highest probability of reward.

As a demonstration of how Thompson sampling works, imagine we had 4 slot machines, with a 20%, 30%, 50% and 45% chance of payout. So we can simulate how the solver finds that slot 3 is the best. Here and in the rest of the notebook, I started from code written by Lillian Wong for her excellent tutorial (all in

At the beginning, we know nothing about the probabilities of the machines, and assume that all values ​​for their true reward probability are equally possible, from 0% to 100% (depending on the problem, this choice of Bayesian code may be a bad assumption, as I discuss later).

One step of the solver involves randomly sampling the posterior probability distributions of each of the machines, and trying the best (this is Thompson’s sampling algorithm), then updating those distributions based on whether there was a prize.

We can see from the graph of the estimated probabilities that one success of machine 4 made us more optimistic about that machine – we now think that higher guesses for the reward probability are more likely.

After running it for 100 simulated draws of the four machines, we can see that it has honed in on better estimates of the probabilities.

And after 10000 trials we are even more confident that 3 has a high probability of reward, because we sampled 3 much more than the others. We also sampled 4 a lot just to be sure, but 1 and 2 we quickly learned were much worse so we sampled less often – we got less accurate and less confident estimates of their reward probabilities, but we don’t care.

There are 19 slot machines that can be played in Celadon’s gaming corner, which pay out coins that can be used to buy TMs (Pokémon Abilities) and Pokemon that are not available anywhere else. Three wheels spin, and you press a button to stop them one at a time, aiming to line up three of the same picture, or at least a combination that starts with a cherry.

This gives 6 coins, or “just enough to keep these suckers hooked” (screenshot by author which is fair use based on teaching, scholarship and research)

The best jackpot is Triple 7, for 300 coins. How did I know machines have different odds? Because a game character told me that.

Before going to something ridiculously complicated like a MAB solver for Thompson’s sampler, I looked online for other tips on beating the casino. Maybe because it’s a pretty old game (I get to them when I get to them) the information was sparse and sometimes contradictory:

So I decided that I would play by pressing the “stop” button.

