Archives

Categories

Variable Names

For a long time I have opposed single letter variable names. Often I see code which has a variable for a fixed purpose with a single letter name, EG “FILE *f;“, the problem with this is that unless you choose a letter such as ‘z‘ which has a high scrabble score (and probably no relation to what your program is doing) then it will occur in other variable names and in reserved words for the language in question. As a significant part of the time spent coding will involve reading code so even for programmers working on a project a useful amount of time can be saved by using variable names that can easily by found by a search program. Often it’s necessary to read source code to understand what a system does – so that is code reading without writing.

With most editors and file viewing tools searching for a variable with a single character name in a function (or source file for a global variable) is going to be difficult. Automated searching is mostly useless, probably the best option is to have your editor highlight every instance and visually scan for the ones with are not surrounded by brackets, braces, parenthesis, spaces, commas, or whatever else is not acceptable in a variable name in the language in question.

Of course if you have a syntax highlighting editor then it might parse enough of the language to avoid this. But the heavier editors are not always available. Often I edit code on the system where the crash occurs (it makes it easier to run a debugger). Installing one of the heavier editors is often not an option for such a task (the vim-full Debian/Lenny package for AMD64 has dependencies that involve 27M of packages files to download and would take 100M of disk space to install quite a lot to ask if you just want to edit a single source file). Incidentally I am interested in suggestions for the best combination of features and space in a vi clone (color syntax highlighting is a feature I desire).

But even if you have a fancy editor, there is still the issue of using tools such as less and grep to find uses of variables. Of course for some uses (such as a loop counter) there is little benefit in using grep.

Another issue to consider is the language. If you write in Perl then a search for \$i should work reasonably well.

One of the greatest uses of single letter variable names is the ‘i‘ and ‘j‘ names for loop counters. In the early days of computing FORTRAN was the only compiled language suitable for scientific tasks and it had no explicit way of declaring variables, if a variable name started with i, j, k, l, m, or n then it was known to be an integer. So i became the commonly used name for a loop counter (the first short integer variable name). That habit has been passed on through the years so now many people who have never heard of FORTRAN use i as the name for a loop counter and j as the name for the inner loop in nested loops. [I couldn’t find a good reference for FORTRAN history – I’ll update this post if someone can find one.]

But it seems to me that using idx, index, or even names such as item_count which might refer to the meaning of the program might be more efficient overall. Searching for instances of i in a program is going to be difficult at the best of times, even without having multiple loops (either in separate functions or in the same function) with the same variable name.

So if there is to be a policy for variable names for counters, I think that it makes most sense to have multiple letters in all variable names to allow for easy grepping, and to have counter names which apply to what is being counted. Some effort to give different index names to different for/while loops would make sense too. Having two different for loops with a counter named index is going to make things more difficult for someone who reads the code. Of course there are situations where two loops should have the same variable, for example if one loop searches through an array to find a particular item and then the next loop goes backward through the array to perform some operation on all preceding items then it makes sense to use the same variable.

12 comments to Variable Names

  • Jan Hudec

    Ouch, it scrubbed the characters in really bad way. I’ll try once more with entities:

    Searching for ‘i’ with grep, less or vim is exactly as easy as searching for ‘index’ — that’s what the ‘\<’ and ‘>’ (or their perl equivalent ‘\b’) are for. Just search for ‘\<i\>‘ instead of just ‘i’. And not-regexp-enabled editors will always at least have a ‘whole word’ option to their search which will do the same service.

    On a side note, perl is actually much *worse* than most languages, because grepping for ‘\$i\>’ will also give you all dereferences of ‘@i’ and ‘%i’, while you still need to cover some more obscure ways of writing ‘$i’ like ‘${ i }’, ‘${”i”}’ and such (’\$\{[ \t\n]*i\>’ is almost good as symbolic references are not allowed under strict).

  • In lenny there is a package called just “vim” which includes syntax-highlighting, but doesn’t depend on X11 or gnome packages. And there’s vim-nox also, but I’m not quite sure about their differences

  • Giacomo Catenazzi

    IMO programmers can afford full vim in a development environment, and to do security check and patch, a good environment give great advantages (and fewer embarrassing errors/typo).

    i,j,k convention was also on BASIC: such variables was implicitly declared as integer-only, so faster than other variables.

    I think rules on identifier should change according the scope:
    on an external symbol, the identifier should have good and long name (and possibly not the common: “debug, verbose, ..”).
    OTOH, I think there is no problem on single character identifier if it is defined locally. C99 included variable declaration within blocks, and allow declaration in “for” statment, which help locality, without need of searches.

    Note: there is a notable exception: the _ function. This single char identifier (function) is used in gettext program, to allow translating the string in argument.

    For interpreted language, it is some more difficult, because the scope could go outside the current screen, so more care about names should be done. OTOH the script are used for “fast programming”.

    A good (and loooong) reference is the cbook in http://www.knosof.co.uk/cbook/cbook.html . I see an excursus of nearly 100 pages on identifier name (on paragraph 792). Unfortunately I don’t find anymore data about length of identifiers.

  • Nick James

    Interesting article. What’s your proposed idea for dealing with nested for loops? Using idx_1, idx_2? idx, jdx? One system that I’ve used in the past is ii, ij, ik.

    Further, am I to understand that your proposal is stating that one should use a *different* variable for each and every for/while loop in a given program or function? That is, if you have 3 different for loops in a function, every one of them should use a different name for the iterator?

  • tomás zerolo

    I _love_ one-letter variable identifiers. My rule of thumb is that the length of the identifier be some (monotonic) function of the useful scope of the object (to throw some number, say the logarithm in base four of the number of lines :-)

    So say if the variable is effectively used within four lines, one char is appropriate. Imho, this _increases_ readability.

    Of course, semantic considerations play a role too.

    Readability is strongly dependent on the judicious use of the freedoms the language gives us — that is the use of the “covert channels” the compiler doesn’t see but the (human) reader does: variable names, indentation (at least in languages where it belongs to _us_ and not to the parser :-)

    All this is my personal taste, of course.

    — t

  • etbe

    Jan: Thanks, that is useful to know.

    Paul: I’ve just installed “vim” on Lenny, it doesn’t use colors. Is that feature disabled by default?

    Giacomo: I guess it depends on the BASIC. The BASIC I learned (MicroWorld BASIC on a Microbee) had single letter variable names being integers and names of the form “i0” or “i[0]” (I can’t remember which) being floats. So “i” was in no technical sense better than “a” for a loop counter – but everyone used i anyway.

    Nick: I think that if you have three loops in a function that are not dependent then they should have different names for best readability and least surprises when editing the code.

    tomas: Good point about scope. If the block is small then having a long name gives little benefit as the reader can see what is going on. Of course the danger is that some other person may modify the code and change that 4 line block into a 400 line block without renaming variables.

  • James Vega

    Paul: The difference between vim and vim-nox is that vim-nox includes the Perl, Python, Ruby, and TCL language bindings. The basic progression in the packaging is: vim-tiny (minimal), vim (base feature set which all packages aside from vim-tiny build on), vim-nox (vim + language bindings), vim-{gtk,gnome,lesstif} (vim-nox + gvim).

    Russell: As indicated in my response to Paul, the vim package does indeed have syntax highlighting available. None of the packages turn on syntax highlighting by default, so there’s a discrepancy in user-side configuration somewhere (possibly “syntax on” is only in your ~/.gvimrc). Also, vim-{full,perl,ruby,python,tcl} are just transitional packages as of Lenny.

    James (with my pkg-vim maintainer hat on)

  • Andres Salomon

    If someone turns it into a 400-line block: shoot them. There’s a reason why programming languages support functions.

  • The reason why i, j, k, l, m, n were chosen as integer variable names (or names beginning with those letters) in Fortran was because it is that way in written mathematics.

    When you write a summation with a sigma sign, the summation variable is usually “i” or “k” and the upper limit is usually “n”. Go check any calculus textbook.

    A polynomial’s coefficients are usually expressed as integer subscripts, so a sub i (I can’t write it that way in this comment) gets converted to A(I) in Fortran. This would be the i-th coefficient in a polynomial like

    A1*X + A2*X**2 + …

    And if a DO loop is analogous to a summation, then:

    DO 1 I=1,N
    Y = Y + A(I)*X**I
    1 CONTINUE

    is the way to write the evaluation of a polynomial. (Remember, Fortran arrays start at 1).

    Fortran was designed to do numerical calculations … FORmula TRANslation.

  • In any case, Fortran allows one to say that
    “GOD is REAL, NIRVANA IS NOT”

  • martin

    I believe too that scope is the key. For scopes that are just a few lines it really don’t matter if you can grep for it.
    And then there are problems where you step by step convert something to what you need.
    e.g. you have a parameter fileName, and need to build a chain of new variables to use it in the function. sure i could name them fileNameFile, fileNameInputStream, fileNameReader, but that’s really useless clutter, in such chains it’s often much more useful to name the first variable readable and the last (if it’s scope is big enough) and give the one’s in the middle just “type code” variable names from the classes of the instances they refer to.

  • Richard

    John Backus, “The History of Fortran I, II, and III,” IEEE Annals of the History of Computing, vol. 20, no. 4, pp. 68-78, Oct-Dec, 1998

    and

    Knuth Donald E. and PardoLuis Trabb, “Early Development of Programming Languages,” Encyclopedia of Computer Science and Technology, vol. 7, p. 419.New York: Marcel Dekker, 1977.

    should cover your needs for Fortran references, the former is accessible on the web, the latter AFAIK only in paper format