Using PID to Cheat an OpenAI Challenge

What is PID in practice and how do you implement it?

Ethan Roberts
7 min readMay 22, 2021
Can we make this system stable without using AI? | Gif by Author

OpenAI Gym is a collection of challenges meant to be used as a testing ground for machine learning (or more specifically reinforcement learning) algorithms. The easy-to-use Python library has made it accessible for anyone to start playing around with these simulations and get to grips with the basics of the reinforcement learning field.

One challenge I was particularly interested in was Cartpole. The goal is to keep a pole balanced vertically above a moving cart for as long as possible, never letting the pole stray too far from its stable point.

The reason this one caught my eye was the similarity to a real-life control theory problem. People actually build these things — and with mind blowing results like this insane control of a triple pendulum.

I thought I’d skip the triple pendulum start off easier. Still, I wondered if it would even be possible to control the Cartpole simulation with control theory, which relies on accurate mathematical modelling. Is the simulation accurate enough on OpenAI’s side? Will we be able to figure out a good model for the system with lots of unknowns? We’ll have to find out.

If you wanna skip past the maths and go straight to the PID control, feel free!

Modelling The System

Even though we don’t know for sure that OpenAI is running accurate physical simulations, getting some equations is a good place to start.

Let’s start by drawing out some diagrams and writing down the variables.

(Left) Dimensions of the system | (Right) Forces on the system excluding reaction forces at the pivot

For more information about how the system is modelled, see this article : https://ctms.engin.umich.edu/CTMS/index.php?example=InvertedPendulum&section=SystemModeling

I’ll skip the gory details, but the gist is that we’re trying to find a couple of equations of the system by summing forces and equating them to the acceleration (linear and angular) using Newton’s F=ma and the rotational equivalent 𝜏=Iα. We then equate, rearrange and simplify using the small angle approximation for sin and cos functions.

We end up with these two equations. One relating effects on the energy of the pole, and one relating the effects of the force on the system.

Equation 1 of the system — relating everything to do with the energy of the pole
Equation 2 of the system — relating everything to do with the forces on the system as a whole

You might notice the SI units of these equations don’t quite add up in the equation 1. This is a result of using the small angle approximations. We go from a dimensionless sine of an angle to the angle in radians.

Some Detective Work

Using some clever tactics to get information about the system

We have some equations that govern the cartpole system, but we have a problem.

How do we know what the values of mass, length and moment of inertia are?

We can only accurately produce a PID controller if we know these values, so is everything we’ve done so far pointless?

Nope.

What we can do is tell the simulation to always apply a force to the right and plot how the system behaves. Because the environment gives us the values of velocity and angular velocity, we can plot these to find some values for the variables we need.

The result of applying a constant force to the right

We can’t really get much useful information from the position or angle, but the fact that the velocity and angular velocity are straight lines are very important. It suggests that the cartpole simulation is physically accurate as a constant force is producing a constant acceleration both linearly and angularly.

In fact, we can work out the linear and angular accelerations by just finding the slope of the line! When we do that, we get the values shown in the picture below.

Now we still need to work out the length, moment of inertia, and mass of the system, but we only have 2 equations! The secret is that we don’t need to actually find the “true” values. We only need the relationship between the variables to be correct. This means we can just guess a few of the values and as long as we work out the others from the equations, we can accurately model the system.

We can arbitrarily define the length of the pole and the mass of the cart to be 1 (for ease).

After some rearranging and approximating, we now have all the values we need to design the PID controller!

(If you want more workings, see the appendix at the bottom)

From top to bottom: mass of the pole, mass of the cart, half-length of the pole and moment of inertia of the pole

Moving to Frequency Space

Just before we get to PID modelling, we need to convert all our time-domain equations into frequency ones using Laplace transforms. This isn’t too hard in practice, so bear with me.

What we’re looking for is the transfer function that relates the angle of the pole to the force we’d need to balance it.

Converting to frequency space is as simple as swapping t for s and swapping differentials with powers of s

We now have the transfer function from the angle to the force, which brings us to MATLAB.

Getting our PID Coefficients

To make things easy for myself, I’ll be using MATLAB’s Control System Designer.

Getting the transfer function into MATLAB is as easy as using the tf() function and passing in the coefficients.

Gtheta is the transfer function from the angle to the balancing force

We can now open the Control System Designer, give it our Gtheta and open the PID tuning window.

The graph you can see is what’s known as the impulse response — what happens to the output when the input is nudged slightly?. The fact the graph rockets off to 10²⁴ shows that the pole has no chance of being stable without a controller.

In fact, let’s have a look at what the cartpole would do in this case!

A controller of just Kp=1 is not very good

As you can see, it starts off with small wobbles which get bigger until the simulation stops. It’s clearly not stable. Luckily fixing it is as easy as pressing “update compensator” in MATLAB, which spits out a PID transfer function that you can see below. Notice that the impulse response ends up at 0 — that’s a good sign!

Using MATLAB’s control system designer to automatically get a PID controller

We can export C and use the pid() function to get out our PID coefficients automatically!

Kp = proportional gain, Ki = integral gain, Kd = derivative gain

So what do we do with these numbers?

How Do You Actually Implement PID?

There’s a reason PID is extremely widely used — it’s super easy to implement, whether in code or as electronics. As we’re simulating with OpenAI, we’ll of course be using code.

The key parts are lines 6–8 (defining the coefficient values) and lines 17–25 (getting measurements from the system, calculating the force to apply, mapping to to either a left or right force).

Line 23 is what could really be considered “the PID controller”. It calculates the required balance force based on the angle, angular velocity and integral of angle over time, as well as our PID coefficients.

But Does It Work?

Yes! Kinda…

Something I hadn’t considered until the end was that the simulation only accepted 2 forces — fully left or fully right. The PID controller however was modeled as being able to apply any value of force, including 0!

This is what line 25 is doing — converting from the continuous value of force to either a left (0) or right (1) based on whether the force was negative or positive.

With this in mind it’s surprising that it works at all! But I’ll happily take it as a victory.

I hope you learned something! I had no idea about how PID was implemented before trying this and was amazed by how simple it was!

If you enjoyed this, feel free to give me a clap or two (I’d really appreciate it!).

If you want to see more like this, check out my profile or the article below!

Appendix

I missed out some of the longer calculations earlier, here’s some of the rearranging if you’re interested :)

Finding a relationship between mc and mp

For the next one, because we only really care when the angle is really small (i.e. pole is almost vertical) we can approximate theta to 0.

Finding a relationship between mp and I.

--

--