Hamilton-Jacobi-Bellman Equation: Reinforcement Learning and Diffusion Models

(dani2442.github.io)

136 points | by sebzuddas 12 hours ago

5 comments

lukko 6 hours ago

I've just started to try and learn the basics of RL and the Bellman Equation - are there any good books or resources I should look at? I think this post is beyond my current level.
I'm most interested in how the equation can be implemented step by step in an ML library - worked examples would be very helpful.
Thank you!

[-]
- sardukardboard 3 hours ago
  
  I worked thru David Silver’s RL course a while back, it’s got great explanations as he builds up the equations. It’s light on implementation, but the intuitive side really complements more code-heavy examples that lack the “why” behind the equations.
  https://davidstarsilver.wordpress.com/teaching/
- ActivePattern 5 hours ago
  
  Reinforcement Learning by Sutton & Barto is an excellent introduction by two of the founders of the field.
  Read here: http://incompleteideas.net/book/the-book-2nd.html
- brandonb 4 hours ago
  
  OpenAI's spinning up in deep RL is free and pretty good: https://spinningup.openai.com/en/latest/
  It includes both mathematical formulas and PyTorch code.
  I found it a bit more practical than the Sutton & Barto book, which is a classic but doesn't cover some of the more modern methods used in deep reinforcement learning.
- srean 5 hours ago
  
  I would recommend that you start with one of the classics (not much of deep RL)
  https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutto...
  This will have a gentler learning curve. After this you can move on to more advanced material.
  The other resource I will recommend is everything by Bertsekas. In this context, his books on dynamic programming and neurodyanamic programming.
  Happy reading.
- porridgeraisin 2 hours ago
  
  The bellman equations (exactly as written above) are not found in ML libraries.
  This is because they work assuming you know a model of the data. Most real world RL is model-free RL. Or, like in LLMs, "model is known but too big to practically use" RL.
  Apart from the resources you use (good ones in other comments already), try to get the initial mental model of the whole field right, that is important since everything you read can then fit in the right place of that mental model. I will try to give one below.
  - the absolute core raison d'etre of RL as a separate field: the quality of data you train on only improves as your algorithm improves. As opposed to other ML where you have all your data beforehand.
  - first basic bellman equation solving (this is code wise just solving a system of linear equations)
  - an algo you will come across called policy iteration (code wise, a bunch of for loops..)
  - here you will be able to see how different parts of the algo become impossible in different setups, and what approximations can be done for each of them (and this is where the first neural network - called "function approximator" in RL literature - comes into play). Here you can recognise approximate versions of the bellman equation.
  - here you learn DDPG, SAC algos. Crucial. Called "actor critic" in parlance.
  - you also notice problems of this approach that arise because a) you don't have much high quality data and b) learning recursivelt with neural networks is very unstable, this motivates stuff like PPO.
  - then you can take a step back, look at deep RL, and re-cast everything in normal ML terms. For example, techniques like TD learning (the term you would have used so far) can be re-cast as simply "data augmentation", which you do in ML all the time.
  - at this point you should get in the weeds of actually engineering at scale real RL algos. Stuff like atari benchmarks. You will find that in reality, the algos as learnt are more or less a template and you need lots of problems specific detailing to actually make it work. And you will also learn engineering tricks that are crucial. This is mostly computer science stuff (increasing throughout on gpu etc - but correctly! without changing the model assumptions)
  - learn goal conditioned RL, imitation learning, some model based RL like alphazero/dreamer after all of the above. You will be able to easily understand it in the overall context at this point. First two are used in robotics quite a bit. You can run a few small robotics benchmarks at this point.
  - learn stuff like HRL, offline RL as extras since they are not that practically relevant yet.
Cloudly 11 hours ago

Ever since the control bug bit me in my EE undergrad years I am happy to see how useful the knowledge remains. Of course the underlying math of optimization remains general but the direct applications of control theory made it much more appetizing for me to struggle through.

[-]
- sebzuddas 6 hours ago
  
  My favourite subject!
jesuslop 7 hours ago

Nice summary, saving it. If author is around, Bellman equation label ended overlapped to eqn., and pargraph quoting signs got into HJB displayed one. Suggest changes is 404 not found. Liked the presentation overall, thank you!
lain98 9 hours ago

I find myself completely outclassed by mathematicians in my own field. I tried to learn a little math on the side after my regular software engineer gig but I'm completely outclassed by phd's.
I am unsure of the next course of action or if software will survive another 5 years and how my career will look like in the future. Seems like I am engaged in the ice trade and they are about to invent the refrigerator.

[-]
- dsign 1 hour ago
  
  > Seems like I am engaged in the ice trade and they are about to invent the refrigerator.
  The way I like to look at it is that I'm engaged in the ice trade and they are about to invent everything else that will end mine and every other current trade. Which leaves me with two practical options: a) deep despair. b) to become a Jacks of all trades, master of none, but oftentimes better than a master of one. The Jacks can, for now, capitalize in the thing that the Machines currently lack, which is agency.
- ecshafer 3 hours ago
  
  IMO Computer Science doesn't have enough mathematics in the core curriculum. I think more CS students should be double majoring or minoring in Physics and/or Math. The skills you gain in analyzing problems and constructing models in Physics, finding truth/false values and analyzing problems in math, and the algorithmic skills in CS really compliment each other.
  Instead of people "hacking" university education to make them purely fotm job training centers. The real hack would be something that really drills down at the fundamentals. CS, Math, Physics, and Philosophy to get an all around education in approaching problems from fundamentals I think would be the optimal school experience.
- rsp1984 8 hours ago
  
  Don't despair. The key to becoming proficient in advanced subjects like this one is to first try to understand the fundamentals in plain language and pictures in your mind. Ignore the equations. Ask AI to explain the topic at hand at the most fundamental level.
  Once the fundamental concepts are understood, what problem is being solved and where the key difficulties are, only then the equations will start to make sense. If you start out with the math, you're making your life unnecessarily hard.
  Also, not universally true but directionally true as a rule of thumb, the more equations a text contains the less likely it is that the author itself has truly grasped the subject. People who really grasp a subject can usually explain it well in plain language.
  
  [-]
  - griffzhowl 7 hours ago
    
    > People who really grasp a subject can usually explain it well in plain language.
    That's very much a matter of style. An equation is often the plainest way of expressing something
    
    [-]
    - rsp1984 22 minutes ago
      
      The problem is that equations give the illusion of conciseness and brevity but in reality always heavily depend on context.
      You give a physicist an equation of a completely unrelated field in mathematics and it will make zero sense to them because they lack the context. And vice versa. The only people who can readily read and understand your equations are those that already understand the subject and have learned all the context around the math.
      Therefore it's pointless to try to start with the math when you're foreign to a field. It simply won't make any sense without the context.
- numbers_guy 8 hours ago
  
  I guess I have the opposite experience. I have a post-graduate level of mathematical education and I am dismayed at how little there is to be gained from it, when it comes to AI/ML. Diffusion Models and Geometric Deep Learning are the only two fields where there's any math at all. Many math grads are struggling to find a job at all. They aren't outclassing programmers with their leet math skillz.
  
  [-]
  - srean 5 hours ago
    
    Don't worry when stochastic grads get stuck math grads get going.
    (One of) The value(s) that a math grad brings is debugging and fixing these ML models when training fails. Many would not have an idea about how to even begin debugging why the trained model is not working so well, let alone how to explore fixes.
    
    [-]
    - p1esk 4 hours ago
      
      Debugging ML models (large part of my job) requires very little math. Engineering experience and mindset is a lot more relevant for debugging. Complicated math is typically needed when you want invent new loss functions, or new methods for regularization, normalization or model compression.
      
      [-]
      - srean 3 hours ago
        
        You are perhaps talking about some simple plumbing bugs. There are other kinds:
        Why didn't the training converge
        Validation/test errors are great but why is performance in the wild so poor
        Why is the model converging so soon
        Why is this all zero
        Why is this NaN
        Model performance is not great, do I need to move to something more complicated or am I doing something wrong
        Did the nature of the upstream data change ?
        Sometimes this feature is missing, how should I deal with this
        The training set and the data on which the model will be deployed are different. How to address this problem
        The labelers labelled only the instances that are easy to label, not chosen uniformly from the data. How to train with such skewed label selection
        I need to update model but with a few thousand data points but not train from scratch. How do I do it
        Model too large which doubles can I replace with float32
        So on and so forth. Many times models are given up on prematurely because the expertise to investigate lackluster performance does not exist in the team.
        
        [-]
        
        p1esk 1 hour ago
        
        Literally every single example you provided does not require much math fundamentals. Just basic ML engineering knowledge. Are you saying that understanding things like numerical overflow or exploding gradients require sophisticated math background?
        
        [-]
        
        srean 13 minutes ago
        
        Numerical overflow no, in case of exploding gradient, yes especially about coming up with a way to handle it, on your own, from scratch. After all, it took the research community some time to figure out a fix for that.
        But the examples you quoted were not my examples, at least not their primary movers.
  - porridgeraisin 3 hours ago
    
    The real use is in actually seeing connections. Every field has their own maths and their own terminologies, their own assumptions for theorems, etc.
    More often than not this is duplicated work (mathematically speaking) and there is a lot to be gained by sharing advances in either field by running it through a "translation". This has happened many times historically - a lot of the "we met at a cafe and worked it out on a napkin" inventions are exactly that.
    Math proficiency helps a lot at that. The level of abstraction you deal with is naturally high.
    Recently, the problem of actually knowing every field enough, just cursorily, to make connections is easier with AI. Modern LLMs do approximate retrieval and still need a planner + verifier, the mathematician can be that.
    This is somewhat adjacent to what terry tao spoke about, and the setup is sort of what alpha evolve does.
    You get that impression because such advances are high impact and rare (because they are difficult). Most advances come as a sequence of field-specific assumption, field-specific empirical observation, field-specific theorem, and so on. We only see the advances that are actually made, leading to an observation bias.
- RA_Fisher 5 hours ago
  
  AI makes it easier to catch up. :)
- AndrewKemendo 2 hours ago
  
  The big thing that made it all click for mathematics was that I stopped thinking about mathematics the way that it was taught to me and I started thinking about it the way that it naturally felt correct to me
  So in my specific case I stopped thinking about mathematics as: how to interpret a sequence of symbols
  But instead I decided to start thinking about it as “the symbols tell me about the multidimensional topological coordinate space that I need to inhabit
  So now when I look at a equation (or whatever) my first step is “OK how do I turn this into a topology so that I can explore the toplogical space the way that a number would”
  Kind of like if you were to extend Nagle’s “what it’s like to be a bat” but instead of being a bat you’re a number
measurablefunc 12 hours ago

It's not clear or obvious why continuous semantics should be applicable on a digital computer. This might seem like nitpicking but it's not, there is a fundamental issue that is always swept under the rug in these kinds of analysis which is about reconciling finitary arithmetic over bit strings & the analytical equations which only work w/ infinite precision over the real or complex numbers as they are usually defined (equivalence classes of cauchy sequences or dedekind cuts).
There are no dedekind cuts or cauchy sequences on digital computers so the fact that the analytical equations map to algorithms at all is very non-obvious.

[-]
- shiandow 7 hours ago
  
  It is definitely not obvious, but I wouldn't say it is completely unclear.
  For instance we know that algorithms like the leapfrog integrator not only approximate a physical system quite well but even conserve the energy, or rather a quantity that approximates the true energy.
  There are plenty of theorems about the accuracy and other properties of numerical algorithms.
  
  [-]
  - measurablefunc 4 hours ago
    
    How do they apply in this case?
- jampekka 11 hours ago
  
  Continuous formulations are used with digital computers all the time. Limited precision of floats sometimes causes numerical instability for some algorithms, but usually these are fixable with different (sometimes less efficient) implementations.
  Discretizing e.g. time or space is perhaps a bigger issue, but the issues are usually well understood and mitigated by e.g. advanced numerical integration schemes, discrete-continuous formulations or just cranking up the discretization resolution.
  Analytical tools for discrete formulations are usually a lot less developed and don't as easily admit closed-form solutions.
- sfpotter 9 hours ago
  
  This is what the field of numerical analysis exists for. These details definitely have been treated, but this was done mainly early in the field's history; for example, by people like Wilkinson and Kahan...
  
  [-]
  - magicalhippo 8 hours ago
    
    I just took some basic numerical courses at uni, but every time we discretized a problem with the aim to implement it on a computer, we had to show what the discretization error would lead to, eg numerical dispersion[1] etc, and do stability analysis and such, eg ensure CFL[2] condition held.
    So I guess one might want to do a similar exercise to deriving numerical dispersion for example in order to see just how discretizing the diffusion process affects it and the relation to optimal control theory.
    [1]: https://en.wikipedia.org/wiki/Numerical_dispersion
    [2]: https://en.wikipedia.org/wiki/Courant%E2%80%93Friedrichs%E2%...
- phreeza 11 hours ago
  
  Doesn't continuous time basically mean "this is what we expect for sufficiently small time steps"? Very similar to how one would for example take the first order Taylor dynamics and use them for "sufficiently small" perturbations from equilibrium. Is there any other magic to continuous time systems that one would not expect to be solved by sufficiently small time steps?
  
  [-]
  - tsimionescu 5 hours ago
    
    Infinity has properties that finite approximations of it just don't have, and this can lead to serious problems for certain theorems. In the general case, the integral of a continuous function can be arbitrarily different from the sum of a finite sequence of points sampled from that function, regardless of how many points you sample - and it's even possible that the discrete version is divergent even if the continous one is convergent.
    I'm not saying that this is the case here, but there generally needs to be some justification to say that a certain result that is proven for a continuous function also holds for some discrete version of it.
    For a somewhat famous real-world example, it's not currently known how to produce a version of QM/QFT that works with discrete spacetime coordinates, the attempted discretizations fail to maintain the properties of the continuous equations.
  - measurablefunc 11 hours ago
    
    You should look into condition numbers & how that applies to numerical stability of discretized optimization. If you take a continuous formulation & naively discretize you might get lucky & get a convergent & stable implementation but more often than not you will end up w/ subtle bugs & instabilities for ill-conditioned initial conditions.
    
    [-]
    - phreeza 11 hours ago
      
      I understand that much, but it seems like "your naive timestep may need to be smaller than you think or you need to do some extra work" rather than the more fundamental objection from OP?
      
      [-]
      - measurablefunc 11 hours ago
        
        The translation from continuous to discrete is not automatic. There is a missing verification in the linked analysis. The mapping must be verified for stability for the proper class of initial/boundary conditions. Increasing the resolution from 64 bit floats to 128 bit floats doesn't automatically give you a stable discretized optimizer from a continuous formulation.
        
        [-]
        
        phyalow 10 hours ago
        
        Or you can just try stuff and see if it works
        
        [-]
        
        measurablefunc 10 hours ago
        
        Point still stands, translation from continuous to discrete is not as simple as people think.
        
        [-]
        
        phreeza 9 hours ago
        
        Numerical issues totally exist but the reason has nothing to do with the fact that Cauchy sequences don't exist on a computer imo.
        
        [-]
        
        measurablefunc 4 hours ago
        
        The abstract formulation is different from the concrete implementation. It is precisely b/c the abstractions do not exist on computers that the abstract analysis does not automatically transfer the necessary analytical properties to the digital implementation. Cauchy sequences & Dedekind cuts are abstract & do not exist on digital computers.
- cubefox 9 hours ago
  
  Real numbers mostly appear in calculus (e.g. the chain rule in gradient descent/backpropagation), but "discrete calculus" is then used as an approximation of infinitesimal calculus. It uses "finite differences" rather than derivatives, which doesn't require real numbers:
  https://en.wikipedia.org/wiki/Finite_difference
  I'm not sure about applications of real numbers outside of calculus, and how to replace them there.
- imtringued 8 hours ago
  
  I can't tell if this a troll attempt or not.
  If your definition of "algorithm" is "list of instructions", then there is nothing surprising. It's very obvious. The "algorithm" isn't perfect, but a mapping with an error exists.
  If your definition of "algorithm" is "error free equivalent of the equations", then the analytical equations do not map to "algorithms". "Algorithms" do not exist.
  I mean, your objection is kind of like questioning how a construction material could hold up a building when it is inevitably bound to decay and therefore result in structural collapse. Is it actually holding the entire time or is it slowly collapsing the entire time?