this post was submitted on 14 Mar 2024

4 points (100.0% liked)

Programming

19348 readers

59 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 2 years ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

MaungaHikoi@lemmy.nz

Any tips to help a scientist become a better programmer? (iusearchlinux.fyi)

submitted 1 year ago by mypasswordistaco@iusearchlinux.fyi to c/programming@programming.dev

29 comments fedilink hide all child comments

Hey there!

I'm a chemical physicist who has been using python (as well as matlab and R) for a lot of different tasks over the last ~10 years, mostly for data analysis but also to automate certain tasks. I am almost completely self-taught, and though I have gotten help and tips from professors throughout the completion of my degrees, I have never really been educated in best practices when it comes to coding.

I have some friends who work as developers but have a similar academic background as I do, and through them I have become painfully aware of how bad my code is. When I write code, it simply needs to do the thing, conventions be damned. I do try to read up on the "right" way to do things, but the holes in my knowledge become pretty apparent pretty quickly.

For example, I have never written a class and I wouldn't know why or where to start (something to do with the init method, right?). I mostly just write functions and scripts that perform the tasks that I need, plus some work with jupyter notebooks from time to time. I only recently got started with git and uploading my projects to github, just as a way to try to teach myself the workflow.

So, I would like to learn to be better. Can anyone recommend good resources for learning programming, but perhaps that are aimed at people who already know a language? It'd be nice to find a guide that assumes you already know more than a beginner. Any help would be appreciated.

top 29 comments

sorted by: hot top controversial new old

[–] demesisx@infosec.pub 2 points 1 year ago* (last edited 1 year ago)

Learn Haskell.

Since it is a research language, it is packed with academically-rigorous implementations of advanced features (currying, lambda expressions, pattern matching, list comprehension, type classes/type polymorphism, monads, laziness, strong typing, algebraic data types, parser combinators that allow you to implement a DSL in 20 lines, making illegal states unrepresentable, etc) that eventually make their way into other languages. It will force you to learn some of the more advanced concepts in programming while also giving you a new perspective that will improve your code in any language you might use.

I was big into embedded C programming years back ... and when I got to the pointers part, I couldn't figure out why I suddenly felt unsatisfied and that I was somehow doing something wrong. That instinct ended up being at least partially correct. I sensed that I was doing something unsafe (which forced me to be very careful around footguns like pointers, dedicating extra mental processes to keep track of those inherently unsafe solutions) and I wished there was some more elegant way around unsafe actions like that (or at least some language provided way of making sure those unintended side effects could be enforced by the compiler, which would prevent these kinds of bugs from getting into my code).

Years later, after not enjoying JS, TS (IMO, a porous condom over the tip of JavaScript), Swift, Python, and others, my journey brought me to FRP which eventually brought me to FP and with it, Haskell, Purescript, Rust, and Nix. I now regularly feel the same satisfaction using those languages that I felt when solving a math problem correctly. Refactoring is a pleasure with strictly typed languages like that because the compiler catches almost everything before it will even let you compile.

[–] UFODivebomb@programming.dev 1 points 1 year ago

My advice comes from being a developer, and tech lead, who has brought a lot of code from scientists to production.

The best path for a company is often: do not use the code the scientist wrote and instead have a different team rewrite the system for production. I've seen plenty of projects fail, hard, because some scientist thought their research code is production level. There is a large gap between research code and production. Anybody who claims otherwise is naive.

This is entirely fine! Even better than attempting to build production quality code from the start. Really! Research is solving a decision problem. That answer is important; less so the code.

However, science is science. Being able to reproduce the results the research produced is essential. So there is the standard requirement of documenting the procedure used (which includes the code!) sufficiently to be reproduced. The best part is the reproduction not only confirms the science but produces a production system at the same time! Awws yea. Science!

I've seen several projects fail when scientists attempt to be production developers without proper training and skills. This is bad for the team, product, and company.

(Tho typically those "scientists" fail to at building reproducible systems. So are they actually scientists? I've encountered plenty of phds in name only. )

So, what are your goals? To build production systems? Then those skills will have to be learned. That likely includes OO. Version control. Structural and behavioral patterns.

Not necessary to learn if that isn't your goal! Just keep in mind that if a resilient production system is the goal, well, research code is like the first pancake in a batch. Verify, taste, but don't serve it to customers.

[–] Savaran@lemmy.world 1 points 1 year ago

Approach programming with the same seriousness that you’d expect a programmer to approach your field with. You say yourself you just want it to “do the thing, conventions be damned”.

Well how would you feel if someone entered your lab or whatever and treated the tools of your trade that way?

[–] Fal@yiffit.net 0 points 1 year ago

Use an IDE if you aren't already. Jetbrains stuff is great. Having autocomplete is invaluable.

[–] boeman@lemmy.world 0 points 1 year ago (1 children)

The thing to think about is reusability. Are you copying and pasting code into multiple places? That's a great candidate to become a class. If you have long lived projects (i.e. something you will use multiple times over a lot of years) maintainability is important. Huge functions and monolithic applications are very hard to maintain over time.

Break your functionality out into small chunks (methods and classes). Keep it simple. It may take a while to get used to this, but your time for adding additional functionality will be greatly improved in the long run.

A lot of great programmers were terrible at one time. Don't let your current lack of knowledge of principles stop you from learning. One of the biggest breakthroughs I had as a programmer is changing how I looked at architecting applications. Following SOLID principles will assist a lot in that. Don't try to understand and use these principles all at once, take your time. Programming isn't what you make your living with, it's a tool to help you be more efficient in your current role.

Realize that becoming a more effective programmer is different for everyone. Like you, I was self taught. I was a systems and network engineer that decided to move into software development. I've since moved into a role that takes advantage of all the skills I've learned through the years in SRE. like you, a lot of what I write now is about automation and analysis.

[–] Fal@yiffit.net 0 points 1 year ago (1 children)

Careful with this. Not everything needs to be reusable, and copy/paste isn't inherently bad.

https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction

[–] boeman@lemmy.world 0 points 1 year ago (1 children)

You aren't wrong... But everything with extended use needs to be maintainable. Making a change in 5 places sucks.

Plus, that's what open-closed principle is all about. Instead of adding additional functionality to current working code, you extend and modify.

[–] Fal@yiffit.net 0 points 1 year ago

Making a change in 5 places sucks, making it in 2 could be reasonable. If 2 pieces of code are similar but different enough, I've seen way too often people try to force them into a common abstraction. That's more what the article is about.

[–] heeplr@feddit.de 0 points 1 year ago (1 children)

It's always good to learn new stuff but in terms of productivity: Don't attempt to be a programmer. Rather attempt to write better research code (clean up code, revision control, better commenting, maybe testing...)

Rather try to improve cooperation with programmers, if necessary. Close cooperation, asking stupid questions instead of making assumptions etc. makes the process easy for both of you.

Also don't be afraid to consult different programmers since beyond a certain level, experience and expertise in programming is vastly fragmented.

Experienced programmers mostly suck on your field and vice versa and that's a good thing.

[–] QuadriLiteral@programming.dev 0 points 1 year ago (1 children)

Odd take imo. OP is a programmer, albeit perhaps not a very good one. Did a PhD (computational astrophysics), been working as a professional dev for 10 years after that. Imo a good programmer writes code that solves the problem at hand, I don't see that much of a difference between the problem being scientific or a backend service. It doesn't mean "write lots of boilerplate-y factories, interfaces and other layers" to me, neither in research nor outside of it.

That being said, there is so much time lost in research institutes because of shoddy programming by researchers, or simply ignorance, not knowing a debugger exists for instance. OP wanting to level up their game would almost certainly result in getting to research results faster, + they may be able to help their peers become better as well.

[–] heeplr@feddit.de 1 points 1 year ago (1 children)

25 years in the industry here. As I said there's nothing against learning something new but I doubt it's as easy as "leveling up".

Both fields profit a lot from experience and it's as much gain for a scientist do become a software dev as an architect becoming a carpenter. It's simply not productive.

there is so much time lost in research institutes because of shoddy programming

Well, that's the way it is. Scientific code and production code have different requirements. To me that sounds like "that machine prototype is inefficient - just skip the prototype next time and build the real thing right away."

[–] QuadriLiteral@programming.dev 1 points 1 year ago

To me that sounds like “that machine prototype is inefficient - just skip the prototype next time and build the real thing right away.”

I don't think you understand my point, which is that developing the prototype takes e.g. 50% more time than it should because of complete lack of understanding of software development.

[–] Turun@feddit.de 0 points 1 year ago* (last edited 1 year ago) (1 children)

As a researcher: all the professional software engineers here have no idea about the requirements for code in a research setting.

I recommend you use

git. It's nice to be able to revert changes without worry.
descriptive variable names. The meaning of descriptive is highly dependent on your situation. Single letters can have an obvious meaning, but err on the side of longer names if you're unsure. The goal is to be able to look at a variable and instantly know what it represents.
virtual environments and requirements.txt. when you have your code working you should have pip (or anaconda or whatever) take a snapshot of your current python installation. Then you can install the exact same requirements when you want to revive your code a few months or years down the line. I didn't do that and it's kinda biting me in the ass right now.

[–] QuadriLiteral@programming.dev 0 points 1 year ago (1 children)

As a researcher: all the professional software engineers here have no idea about the requirements for code in a research setting.

As someone with extensive experience in both: my first requirement would be readability. Single python file? Fine with that. 1k+ lines single python file without functions or other means of structuring the code: please no.

The nice thing about python is that your IDE let's you jump into the code of the libraries you're using, I find that to be a good way to look at how experienced python devs write code.

[–] Turun@feddit.de 1 points 1 year ago (1 children)

You can jump to definition in any language. In fact, python may be one of the worst ones, because compiled libraries are so common. "Real signature unknown" is all you will get some times. E.g. Numpy is implemented in C not python.

[–] QuadriLiteral@programming.dev 1 points 1 year ago (1 children)

My point about the jumping into was that you can immediately start reading the sources. Most alternative languages are compiled in some form or other so all you'll see is an API, not the implementation.

[–] Turun@feddit.de 1 points 1 year ago* (last edited 1 year ago) (1 children)

My comment was not asking for clarification, I am contradicting your claim.

Granted, my experience is mostly limited to python and rust. But I find that in python you reach the end of "jump to definition" much much sooner. Fundamental core libraries of Python are written in C, simply because the performance required cannot be reached with python alone. So after jumping two levels you are through the thin wrapper type and your compiler will give you an "I don't know, it's byte code".
In Rust I have yet to encounter this. Byte code is rarely used as a dependency, because compiling whatever is needed is no issue - you're compiling anyway - and actually can allow a few more optimizations to be performed.

Edit: since wasm is not yet wide spread, JavaScript may be the best language to dig deep into libraries.

[–] QuadriLiteral@programming.dev 1 points 1 year ago (1 children)

Mostly ML or data processing libraries I would assume, I've read tons of REST server and ORM python code for instance, none of that is written in C.

Wrt rust: no experience with that. I do do a lot of C++, there you quickly reach the end as typically you're consuming quite a bit of libraries but the complete sources of those aren't part of what is parsed by the IDE as keeping all that in memory would be unworkable.

[–] Turun@feddit.de 1 points 1 year ago

REST server and ORM python code

Fair enough, that can be achieved with pure python.

[–] Diplomjodler@feddit.de 0 points 1 year ago (1 children)

Forget everything you hear about OOP and just view it as a way to improve code readability. Just rewrite something convoluted with a class and you'll se what they're good for. Once you've got over the mental blockade, it'll all make more sense.

[–] WolfLink@lemmy.ml 0 points 1 year ago (1 children)

To add to this, there are kinda two main use cases for OOP. One is simply organizing your code by having a bunch of operations that could be performed on the same data be expressed as an object with different functions you could apply.

The other use case is when you have two different data types where it makes sense to perform the same operation but with slight differences in behavior.

For example, if you have a “real number” data type and a “complex number” data type, you could write classes for these data types that support basic arithmetic operations defined by a “numeric” superclass, and then write a matrix class that works for either data type automatically.

[–] ALostInquirer@lemm.ee 0 points 1 year ago (1 children)

One is simply organizing your code by having a bunch of operations that could be performed on the same data be expressed as an object with different functions you could apply.

Not OP, but also interested in wrapping my head around OOP and I still struggle with this in a few different respects. If what I'm writing isn't a full program, but more like a few functions to process data, is there still a use case for writing it in an OOP style? Say I'm doing what you describe, operating on the same data with different functions, if written properly couldn't a program do this even without a class structure to it? 🤔

Perhaps it's inelegant and terrible in the long term, but if it serves a brief purpose, is it more in the case of long term use that it reveals its greater utility?

[–] Turun@feddit.de 1 points 1 year ago* (last edited 1 year ago)

I use classes to group data together. E.g.

@dataclass.dataclass
class Measurement:
    temperature: int
    voltage: numpy.ndarray
    current: numpy.ndarray
    another_parameter: bool
    
    def resistance(self) -> float:
        ...

measurements = parse_measurements()
measurements = [m for m in measurements if m.another_parameter]
plt.plot(
    [m.temperature for m in measurements], 
    [m.resistance() for m in measurements]
)

This is much nicer to handle than three different lists of temperature, voltage and current. And then a fourth list of resistances. And another list for another_parameter. Especially if you have more parameters to each measurement and need to group measurements by these parameters.

[–] Asudox@lemmy.world 0 points 1 year ago* (last edited 1 year ago) (1 children)

I'd say go with Go or Rust. Go is like Python (garbage collection) but compiled. Rust is kind of like C++ but not exactly. It does not have garbage collection or manual memory management but something called "ownership and borrowing". It's as fast as C++ or even faster and has a modern syntax. Though Rust is harder than Go since it is under the hood a systems programming language. If you want something faster than Python, Go is good. I specifically chose Rust over Go since I wanted performance and just wanted to try how it was. I'm still a beginner in Rust but I wrote a few projects at reasonable scale for my level. And also, Rust's error messages are extremely nice. It really lives up to the memes.

To learn Rust: https://www.rust-lang.org/learn

To learn Go: https://go.dev/learn/

[–] Fal@yiffit.net 0 points 1 year ago

Rust syntax is way closer to Python than go. Go's syntax is awful imo. It's like objective C

[–] MxM111@kbin.social 0 points 1 year ago (1 children)

As one physicist to another, the most important thing in the code are long variable names (descriptive) and comments.

We usually do not do multi-people multi year projects, so all other comments in this page especially the ones coming from programmers are not that relevant. Classes are cool, but they are not needed and often obscure clarity of algorithmic/functional programming.

S. Wolfram (the creator of Mathematica) said something along these lines (paraphrasing) if you are writing real code in Mathematica - you are doing something wrong.

[–] abhibeckert@lemmy.world 0 points 1 year ago* (last edited 1 year ago) (1 children)

We usually do not do multi-people multi year projects

Seriously - why not?

Say you're doing an experiment, wouldn't it be nice if someone else could repeat that experiment? Maybe in 3 years? in 30 years? in 3,000 years time? And maybe they could use your code instead of writing it themselves and possibly getting it wrong?

If something is worth doing, then it is worth doing properly.

Classes are cool, but they are not needed and often obscure clarity

I write code all day professionally. A lot of my code doesn't use classes. I agree they often "obscure clarity".

But sometimes they do the opposite - they make things crystal clear. It's important to know how to use classes and even more important to know when to use them. I guarantee some of the work you do could benefit from a few simple classes. They don't need to be complex - I wrote a class the earlier today that is only four lines of code. And yes, a class was apropriate.

[–] Turun@feddit.de 0 points 1 year ago (1 children)

You know how changing requirements is the bane of real™ production grade™ software?

In science requirements change all the time. You write some 50-100 lines to plot your results. You realize that the effect is not visible, so you go back to the lab, change 5 variables and run the test again. Some quick code changes and you see the effect. Perfect. Now you do the measurement as a function of temperature. You adapt your script, you indent the data processing code to turn your list of files into a list of characteristics parameters and adapt the plotting. You run the experiment for the three samples you have prepared and compare their plots. Some more experiments tuning and corresponding script tuning is required. You take the characteristic parameters that your code (grew to 200 lines now, but whatever) calculated and write a new script to take that array and plot it nicely.

Now someone wants to repeat the experiment 4 years later. The measurement equipment changed and the data format is slightly different. It's impossible to document the exact state of the hardware in your code anyway. They are actually interested in a different effect and want to plot that as well, but they need the effect/characteristics parameters that are shown by your code as a sanity check. They need to rewrite 125 of the 200 lines.

There never is a finished product that is worth maintaining long term. Everyone using the script has to understand the domain precisely anyway. Is it worth it to reuse the old code when you need to rewrite more than half of it anyway?

Don't get me wrong, code reuse does happen. But it's much more "oh, I wrote that three months ago somewhere else" ctrl-c ctrl-v. "Ok, now I need to change five lines in this function to adapt to the new thing I'm trying to do." It makes absolutely no sense to write that function in a abstracted way. Every time you use it the requirements changed and the abstraction is no longer valid anyway.

[–] MxM111@kbin.social 1 points 1 year ago

They need to rewrite 125 of the 200 lines.

And I guarantee, it is much easier to write 200 new lines than change 125 out of 200 lines in somebodies code. No matter how nicely that code is written.