Im curious about the people who use R in big tech companies that you've worked a...

disgruntledphd2 · on June 28, 2024

So, (speaking as someone who started with R and now predominantly writes Python), I think there's a bunch of things going on here.

1. R is 100% better for analytics work and statistical modelling. There's just no contest.

2. Python is much, much better for data getting (APIs/scraping etc) and dealing with non table-like data. Again, there's basically no contest here.

3. Software engineers hate R (in most cases), which means that it's easier to hand over work for production in Python.

This leads to a situation where it looks like most of the prod-level work is being done in Python, but if you look under the covers you'll discover that most prototyping/analysis/exploration is done in R and then ported to Python if it works.

Like, Python is a great language for lots of things, but it's pretty terrible for exploratory DS work (pandas is like the worst features of base R and base Python mashed together in an unholy hybrid).

There's also the fact that all the NN stuff is predominantly Python, so lots of companies believe that they need Python people, which reinforces the stereotype.

And finally, while I love R, Python has more guardrails, and it's harder to make an unmaintainable mess with it (relative to R). Particularly when people use all the various lazy evaluation packages that the tidyverse has used over the past decade (I once maintained a codebase that used all of these in different places, it was not a fun experience).

greentxt · on June 28, 2024

One of the better comments in this thread, I would only qualify that different levels of ability mediate much of the "how hard is it to make an unmaintainable mess" dimension. Dplyr/tidy code can be pasta, as can pandas, and there is really a whole new level of that given llm generated nonesense edited/tweaked by novices masquerading as seniors.

Apropos this idea of a vs code competitor, I wish they would spend more effort on existing products. I find quarto frustratingly buggy and meanwhile see no reason to move my workflow from vscode to this new thing. Ymmv

disgruntledphd2 · on June 29, 2024

> I would only qualify that different levels of ability mediate much of the "how hard is it to make an unmaintainable mess" dimension

Oh definitely, but at least Python's stdlib is relatively consistent, which helps packages be a little more so.

My favourite example is t.test, which is not a t method for the test class, unlike summary.lm which is.

And there's like 4 different styles of function naming in base & stats alone.

Python has problems (for gods sake, why isn't len a method?) but it's a little more consistent.

I used to think that R was responsible for a lot more of the mess than I now do, having seen the same kind of DS code (and I am a DS) written in both Python and R.

And it would be sweet if R had a pytest equivalent, if I never have to write self.assertEqual again, it'll be too soon.