Skip to main content
Professional

Flappy Bot: using machine learning to play games

By 11/11/2017No Comments4 min read

Last quarter, my colleague and I took a detour from our usual product strategy work to tackle a deceptively simple challenge: teaching a computer how to play the notoriously frustrating mobile game, Flappy Bird.

The resulting “Flappy Bot” was an immediate success; it played the game perfectly, achieving scores that would be impossible for any human. It never died, flying through the gaps with cold, mathematical precision. It was, in effect, a study in efficiency through constraint, a theme I often apply to my own creative process.

On the surface, it is a fun coding challenge. However, if you look closely at how the bot learned, it provides a crucial lens through which every product and business leader must view the future of Artificial Intelligence. This experiment was not about coding an outcome; it was about defining a goal and letting the system figure out the optimal path. This concept: Reinforcement Learning is the paradigm shift we have been waiting for.

The machine learns from failure (The “Game Over” screen)

Traditional AI is built on supervised learning: we feed it millions of labelled examples (e.g., “this is a cat,” “this is not a cat”) and it learns to classify. Our Flappy Bot worked differently. It used a simple method based on Reinforcement Learning (RL), in our case, a light application of Genetic Algorithms.

The bot was given inputs:

  • Its current vertical position.
  • Its speed/velocity.
  • The coordinates of the next set of pipes.

It had one action: Flap or Don’t Flap.

The core breakthrough was defining the Reward and the Punishment. The reward was simple: surviving. The punishment was clearer: the instant it saw the Game Over screen, the run was terminated, and that “brain” (neural network) was deemed less fit.

By rapidly generating and iterating hundreds of “birds” and forcing them to compete based on survival time, the algorithm eventually discovered an optimal strategy that no human explicitly coded. It learned why the pipe hits were happening and evolved to avoid them. The result is a system that self-corrects based purely on environmental feedback.

The future is autonomous iteration

This simple principle is what takes Machine Learning beyond data classification and into the realm of autonomous decision-making. We have moved past predictive modelling; we are entering the era of prescriptive systems that learn how to act in real-time.

What does this mean for our businesses in 2018 and beyond?

Self-optimising product

Imagine a dynamic pricing engine or an automated logistics router that uses RL. Instead of relying on a human team to tweak parameters and observe A/B test results, the system itself receives the punishment of lost revenue or missed delivery windows. It then autonomously adjusts variables: pricing, inventory allocation, truck routes: until the “Game Over” screen of failure is never seen. The system becomes its own operations team, constantly seeking maximal efficiency.

Adaptive UX

In the broadcast world, we deal with real-time data constraints. Instead of hard-coding display rules, an RL agent could learn to instantly optimise a user interface based on current network latency, device load, and viewing conditions, rewarding itself for minimal load time and seamless user experience. This level of tactical execution, driven by pure efficiency, is a game changer for high-stakes performance.

Eliminating the “hunch”

Our Flappy Bot taught us that the human “hunch” about optimal strategy is often deeply flawed. When we define a crystal-clear objective (survival, maximum profit, minimal friction), the machine will find non-obvious, mathematically superior paths that human intuition would overlook. It forces us as product leaders to stop theorising and start strategising around pure, measurable outcomes.

The greatest challenge facing product leaders today is not adopting AI tools, but re-framing our problems to suit Reinforcement Learning. We need to stop coding solutions and start defining clean, measurable rewards and punishments. Flappy Bird was merely the training ground. The real game: the optimisation of enterprise: is about to begin. If we can teach a pixelated bird to fly forever, what human problem cannot we solve next?

Leave a Reply