Which Programming Language Should You Choose?

Introduction: My Hesitation on the Path to the Dork Side

            When I first became interested in programming one of the first things I agonized over was which programming language to start with.  I knew very little about programming (and computers in general) and I was deathly afraid of pouring time and energy into learning a language that would turn out was about to become obsolete.  I worried that if I learned an obsolescent language that I would have to spend even more energy unlearning all that I learned from working with the previous language.  After some searching on the internet and talking to fellow students, I finally decided on learning C++.  From what I heard it was an in demand language so that was that.

In retrospect, I’m surprised that I wasn’t aware of how silly that mindset was; I did know how to use a search engine right?  Something I came to realize is that once you learn the fundamentals of programming, it really is not that difficult to learn additional languages.  As long as you don’t choose a very obscure or antiquated language, you should be fine (which you probably wouldn’t come across as a novice anyway).  Despite what I thought at the time, programming languages have a much longer lifespan than most computer technology I was familiar with.  As a child I was aware mostly of computer hardware and video games, both of which have very short lifespans in comparison to a programming language.  Whereas your iPod from five years ago is ancient, the C++ programming language dates back to 1983.  Furthermore, C++ is an extension of the older (and still non-obsolescent) C programming language, which dates back to somewhere in late 60’s when it was being developed by Dennis Ritchie at Bell Labs.

Something else to consider is the relative similarities between many programming languages in terms of syntax, usage, and philosophy.  Many languages use the same (or quite similar) symbols and usage conventions.  Also it is quite common for languages to have extensions or options to allow the language to adopt paradigms not commonly used by that language by default, and allow it to replicate features of other languages.  While there are definitely clear differences between programming languages, it is usually true that becoming proficient in one programming language significantly lowers the bar in terms of difficulty of learning a new programming language.  This is completely the opposite of my fear that I might have to ‘unlearn’ techniques if I picked the “wrong language”.

Despite the following bit of reassurance it is important to point out that programming languages do have significant differences.  It is important to know a few things about computing in general before picking a programming language to learn.  Some programming languages have specialized uses and may not be suitable to specific tasks, while being invaluable to others.  Other programming languages are very broad and can be applied to many uses to varying degrees of success.  This article is an attempt to provide a guide that I wish that I had all those years ago when I began my journey to the dork side.

Low Level: Bit by Bit, Byte by Byte

            The first thing to know is the layers of programming levels into which all languages fall, be it not very neatly however.  All languages are subjectively ranked from lowest to highest, with a clear delineation between high and low, and a lot of ambiguity in between that.  The levels refer to how close the language is to the hardware, or how abstract the language is.  Lower level languages have the most control over hardware, but are not very portable (meaning that programs are less likely to work on a wider variety of systems) and take longer to write.  On the other end of the spectrum, the higher level languages are very portable and easy to write, but have much less control over the hardware.

At the lowest level is machine code, the strings of 0’s and 1’s that are the digital lingua franca of modern computing machines.  Machine code indicates exactly which circuits on a computer or device is switched on and which are switched off.  This allows for microscopic level control over a computer’s hardware, but is very specific to the computer or device for which it is intended.  Since computer hardware is typically proprietary technology, it varies dramatically from manufacturer to manufacturer, and it is the least portable of all computer languages.  Machine code has very little likelihood of being compatible with anything but the exact type of device it is written for.  Furthermore, machine code is very difficult to program in directly and thankfully virtually no one programs in machine code directly.  Regardless of these deficiencies all other programming languages are eventually translated into this language when executed.

Slightly higher level is assembly code, Assembly code incorporates simple human language with very basic commands such as LOAD, STORE, and ADD.  This language was more common before the development of C, as it was much easier to work with than machine language and more portable.  Assembly language is translated into machine code when it is run, and is still focused on manipulating data on a very low level, focusing on basic calculations and storage of data in computer registers.  Despite its high level of control over hardware, it is rarely used today outside of developing simple hardware drivers and working on devices with very little available memory (although higher level languages are now taking over the latter).  Working on a typical application with assembly language or machine code is akin to building a car atom by atom, or molecule by molecule.  Obviously neither machine languages nor assembly languages are a good choice for a first language.  Indeed, many modern programmers know very little or any of either.

A Block of assembly code

If you're looking at this and thinking "hey this is obvious", call your local mental health professional and/or priest. To see what this does by clicking the image.

Mid-level Languages: A Better Way to Build a Car

            The first language that is a viable choice is C.  C is an older language that is still in use in many legacy applications, and whose concepts underlie many modern programming languages.  While C frees the programmer from much of the micromanagement of data, it still allows the programmer a great deal of access to the underlying hardware.  This, combined with the lack of modern programming concepts such as garbage collection, object oriented development, and exception handling, still places a lot of responsibility on the shoulders of the programmer.  While it is inevitable that most programmer will learn at least a little C, it is probably not the first language you should start with as it assumes that the programmer is capable of avoiding things like buffer overflow and performing other nitty-gritty aspects of programming.  In 1983, Bjarne Stroustrup developed C++ to provide modern features and object-oriented concepts such as classes.  While still in heavy use in modern programs, I would also argue that the many of the responsibilities inherited from C also keep this from being the best choice for a first language.

Angry C being angry

C will not hold your hand. C will not clean up after you. C obeys without question. (See more angry C by worldgnat by clicking the image)

Higher up the continuum from C/C++, things start to become a little cloudier (for me especially).  The next language to consider, in my opinion would be Java.  While it shares some similarities to C, in terms of syntax and conceptualization, Java is especially designed to be object oriented and has many modern features built into the standard language.  What separates Java from C++, is both its automatic management of many programming tasks and how the language is translated into lower level languages.

While C++ links together several files and then compiles into an executable binary (ie. a program written in machine code), Java compiles into something called bytecode.  Bytecode is an intermediate representation, essentially a boiled down depiction of the java code which is then feed to a program called an interpreter.  The interpreter then continues the work originally set out in writing the bytecode and returns an executable binary program which can be understood and manipulated by the computer.  The advantage of this approach is portability.  Due to this feature, Java code written on one machine is able to be run on any machine that has the Java interpreter.  The interpreter’s backends allow for executable code for that specific machine to be generated from bytecode that is feed into it.  By working as a middleman the interpreter clears out a lot of the problems of earlier languages in terms of portability, but sacrifices execution time in the process.  C/C++ programs typically run more quickly than Java programs as they are directly compiled into machine code, likewise machine code runs more quickly than C/C++ programs as they are machine code (although the eventual products of C/C++ and Java are also machine code).

Dynamic Languages: Learn By Doing

            While Java is definitely a good potential first language due to its portability and enhanced ease of use, I would recommend similarly high level languages such as Python, Ruby and Perl (personally in that order).  Up until this point, most of the languages (C/C++, Java) are what are known as statically typed languages.  These languages require that you write a program, link the various files together, compile them together into bytecode or machine code, and then run the executable.  On the other hand languages such as Python are dynamically typed.  Dynamically typed languages do not make executable files, instead the bytecode is interpreted every time the program is run.  While this incurs a performance penalty, it allows code to be written on the fly.  This means that you can type in programs a fragment at a time, with the results being shown as they are entered.  These interpreters allow you to test expressions and code fragments in these languages and see the results, typically immediately.

Such a feature is great for novice programmers as it allows them to experiment with new language conventions and concepts.  Many more advanced programmers use these languages to rapidly prototype applications and see the results, then rewriting that part of the larger program using a more efficient language such as C++.  However, with advancements in compiler technology and computer hardware, many programmers write fully functional programs directly in languages such as Python and Ruby.  Python, in particular overcomes much of its performance liability by creating “compiled” versions of saved source code by converting the source code file into C (by default) and then using that file when the python file is launched.  As such, most python code from files run with execution times comparable to C.      

A further advantage of a language like Python is that it handles memory management, gratis.  Therefore, programmers are freed from low level concerns and allowed to focus on more pressing issues.  These modern languages typically manage memory at least as well as a competent low level programmer.  Python and Ruby are also particularly suited for object oriented programming as all programming constructs are treated as objects.  Furthermore, these languages are typically very elegant and easy to read.  As such I would highly recommend them to anyone who wants to learn the programming trade.

Top Floor: Databases, Browsers, and Video Editors

            Beyond what I have laid out on the programming level continuum are even higher level languages.  These languages have very specific functionality such as operating within web browsers and by video editing software; examples of such languages are Javascript, Flex, SQL, and Ajax.  Languages such as javascript can be very helpful to learning, and are also very useful to know in their own right, but are also not broad enough to be as helpful as other languages.

The highest level languages are arguably not really programming languages, per se.  Languages such as SQL, QUEL, and D are better described as query or database languages as they are focused on retrieving and storing records in computer databases.  While languages such as HTML/CSS, XML, and TeX are markup languages, or as I like to call them, formatting languages.  These formatting languages eschew commands and control structures and simply focus as a template for the storage and arrangement of data.  The common thread running amongst these high level computing languages, and what essentially separates them from programming languages, is that such languages do not detail computation directly but rather direct the arrangement, storage, and loading of data.

In conclusion, I would recommend that the reader start with Python (or Ruby) as it has features which facilitate experimentation, has a well-designed syntax, sound programming philosophy, and other features which handle low level minutia and allow the novice to better focus on the more immediate concerns of modern programming.

Zen of Python

Mantra #12-13: "There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch."

Once you have a good enough grasp on programming in python it would make a great deal of sense to move on to a lower level language such as C/C++/C# or Java.  With a solid grasp of the underlying principles of programming a student can place more concentration on the lower level elements of modern programming, and allow for a richer understanding and skill set.  Most of all I heartily suggest that any potential new students of programming learn from my lesson and take every opportunity to experiment with new languages and technologies.  For such is the kingdom of code; disassemble, guess, hack, reassemble, repeat.  Do not fear the unknown!