Training mRNA Language Models Across 25 Species for $165

82 points | by maziyar 2 days ago

9 comments

  • seamossfet 1 hour ago

    The problem with models like this is they're built on very little actual training data we can trace back to verifiable protein data. The protein data back, and other sources of training data for stuff like this, has a lot of broken structures in them and "creative liberties" taken to infer a structure from instrument data. It's a very complex process that leaves a lot for interpretation.

    On top of that, we don't have a clear understanding on how certain positions (conformations) of a structure affect underlying biological mechanisms.

    Yes, these models can predict surprisingly accurate structures and sequences. Do we know if these outputs are biologically useful? Not quite.

    This technology is amazing, don't get me wrong, but to the average person they might see this and wonder why we can't go full futurism and solve every pathology with models like these.

    We've come a long way, but there's still a very very long way to go.

    • colingauvin 14 minutes ago

      HN's blindspots never cease to amaze me.

      I am a structural biologist working in pharmaceutical design and this type of thing could be wildly useful (if it works).

      • maziyar 2 days ago
      • rubicon33 4 hours ago

        Can someone explain what one might use this model for? As a developer with a casual interest in biology it would be fun to play with but honestly not sure what I would do

        • colechristensen 3 hours ago

          You can get your feet wet with genetic engineering for surprisingly little money.

          This guy shows a lot of how it's done: https://www.youtube.com/@thethoughtemporium

          Basically you can design/edit/inject custom genes into things and see real results spending on the scale of $100-$1000.

          • someuser54541 3 hours ago

            Is there something like this in text/readable format?

            • _zoltan_ 1 hour ago

              My main concern is using fungi. If it ends up in my lungs I'm most likely screwed, right?

              • nurettin 20 minutes ago

                Yes, but most students produce their best work while infected.

          • khalic 4 hours ago

            > In Progress: CodonJEPA

            JEPA is going to break the whole industry :D

            • digdugdirk 4 hours ago

              Can you explain this? I haven't heard of JEPA, and from a quick search it seems to be vision/robotics based?

              • khalic 3 hours ago

                It’s a self supervised learning architecture, and it’s pretty much universal. The loss function runs on embeddings, and some other smart architectural choices allover. Worth diving into for a few hours, Yann LeCun gives some interesting talks about it

                • lukeinator42 4 hours ago
              • simianwords 4 hours ago

                What makes these Domain specific models work when we don’t have good domain models for health care, chemistry, economics and so on

              • yieldcrv 4 hours ago

                Distributing the load on this will probably be infinitely more useful than “folding at home”

                • HocusLocus 5 hours ago

                  gray goo of the future

                  • skyskys 2 hours ago

                    hmmmm seems like some fake hype.