Data Manipulation in Clojure Compared to R and Python

(codewithkira.com)

54 points | by tosh 2 days ago

4 comments

  • olivia-banks 12 minutes ago

    Having "NA" being treated as nil/null/None by default seems like it would cause the Namibia problem!

    • ertucetin 1 hour ago

      I’ve built many different kinds of software (backend, frontend, 3D games, cli tools, code editor, and more) with Clojure and have been using it for over a decade now.

      I can confidently say that, among the list I mentioned, it’s the best for data manipulation/transformation. Thanks to the author for presenting it clearly and showing how the libraries and code look across different languages, all of which do a great job.

      But Clojure has its own special place (maybe in my heart as well :). I think Clojure should be used more in the data science space. Thanks to the JVM, it can be very performant (I’m looking at you, Python).

      • __mharrison__ 1 hour ago

        Good pandas and polars code should also be written in an immutable way...

        • epgui 1 hour ago

          Good python code can exist, but python makes it so easy to write bad code that good python rarely exists.

          • nxpnsv 1 hour ago

            Agree. While it is common to see code like these pandas examples, it is very possible to write these manipulations so that they return a new frame or view without changing the inputs.

          • soumyaskartha 1 hour ago

            Clojure never got the data science crowd even though the language is genuinely good for it. Always felt like a distribution problem more than a technical one.

            • asa400 1 hour ago

              Unfortunately, having to mess around with a JVM is a tough sell for a lot of data analysis folks. I'm not saying it's rational or right, but a lot of people hear "JVM" and they go "no thank you". Personally I think it's a non-issue, but you have to meet people where they are.

              • pjmlp 15 minutes ago

                The irony given the mess of Python setup where there are companies whose business is to solve Python tooling.

                • famicom0 55 minutes ago

                  Meanwhile, I find it very annoying to deal with the litany of Python versions and the distinction between global packages and user packages, and needing to manage virtual environments just to run scripts. That being said, I am not an expert but that's always been my experience when I need to do anything Python related.

                • levocardia 55 minutes ago

                  In this very post you can see why: the dplyr code is just so much more readable. Like a lot of python, dplyr reads almost like pseudocode: take this dataset, select the columns that start with "bill", then filter so that bill_length is less than 30. So simple and so little fluff!

                  • erichocean 41 minutes ago

                    > is just so much more readable

                    I thought that too before I learned Clojure, now I find them equally readable.