i started to learn rust to be relevant in data science of the future
I had three programming language classes in the curriculum of my Master's in Business Analytics and Big Data:
- SQL
- R
- Python
It is obvious that SQL is included, as there are no doubts on its utility and application to data science and all data.
As for R and Python, I remember we had some kind of debate on what language is the best. Even to the extent that the class got divided into two camps on this topic: one was advocating for R and another for Python. I was definitely in Python camp as roughly 70% of the class.
Now I am more a believer that language is just a tool and you should be comfortable with any tool, but when you are at the start of learning curve, there are so much information that needs to be understood, so you just have to make some trade-offs and follow 80/20 principle in your learning.
Regardless of what I believe in right now, after the graduation and working for few companies, I have heard R to be mentioned only few times, including some great online courses in quantitative finance where R was used. I don't know if I am in some Python bubble or victim of the selection bias, but I have a feeling that Python completely dominates the data industry.
new competitor: rust?
I recently created X/Twitter account for the third time after all this Elon Musk saga. In Kazakhstan it is not the most popular social media platform (we use Instagram the most), so there were no network effect for me to tweet on a regular basis. This time though I started to use X/Twitter more as a way to read and learn more about trends and news.
There were few news that sparked my interest regarding Rust in data:
news #1: pydantic v2 is rewritten in rust
Pydantic is a big thing in Python ecosystem and new version is rewritten in Rust to gain up to 50x boosts in performance.
news #2: polars leading the benchmarks in data processing
Polars is a new (blazingly fast) DataFrame library. It beat all benchmarks in performance together with duckdb. When I checked Polars' docs, I noticed that there are only two API references: one (obviously) for Python, and another for Rust. I find it interesting that new (very popular) super fast DataFrame library is usable only in Python and Rust.
news #3: use of rust in running llm on cpu
LLMs (Large Language Models) changed our perception of what is possible to do with ML/AI and significantly improved our lives in 2023. So far training or fine-tuning LLMs was extremely expensive exercise that very few companies could afford. One of the main drivers of the cost is GPU and if we could somehow run and fine-tune them on CPU, we will reduce barriers to entry and expand the use cases for LLMs.
Apparently some smart folks created ways to leverage Rust performance in running LLM on CPUs. I haven't played with it yet, but seems like Rust is a language that many in data and AI will be betting on as it decrease significantly the cost. In that sense, Rust is truly force-multiplier.
news #4: rust was claimed as the most admired language among developers
Although the survey was related to developers and not data professionals, I believe this claim is positive thing by itself. This statement is a huge social proof that made me curious why it is admired and will I admire it too (putting benefits aside)?
To figure it out I didn't come up with a better idea than just started learning it.