As a programmer, I was shocked when I read the Reinhart-Rogoff calculations were done on Excel. As a researcher, however, I was not.
At least tools like Matlab make the algorithms easily verifiable, the idea of writing tests in research is still an anomaly. The research groups that open source their projects do a much better job at this.
IMHO, the biggest problem with Excel is that it is difficult to place logic in separate plain-text source files. By wrapping the logic and the data up into opaque binary .xls files, standard management tools like “diff” and “blame” cannot be used, which makes it extraordinarily and unnecessarily hard to effectively manage and review business critical (not to mention global-economy-critical) logic stored in Excel.
Thierry Lhôtesays:
Two nice blog posts on spreadsheet.
But if we want to manipulate cross-referenced data on an intranet, do you have an idea of the best opensource tools or libraries to substitute to Excel usage ?
IMO, everyone who may be in a job where automation via spreadsheet is likely, needs training in SDE fundamentals: unit testing, the important of open source and open data for reproducibility, version control, and code review. We are all computer scientists now.
Eloise Pasteursays:
I haven’t used Excel in a few years, but it certainly used to have a control (I think it was Cntl-`) to display the formula in each cell instead of the values.
Now, I agree it’s still a nightmare to read and validate pages of cells with crazy formulae in them, but it’s slightly better than clicking into each cell and should help spot discrepancies like “this cell is doing sum(A10:A50) and the one next to it is doing sum(B10:B35). But just a FYI.
Ironically enough I learnt this when supporting some people doing self-directed learning on Excel at a very low level. It really started from “this is how you turn the computer on, this is how you start Excel” in lesson 1 and this was in about lesson 3. I’d been using Excel for keeping accounts and the like for several years and training people to use it for a few by then. And, as you can, memory of the keystroke clearly lingered because I still never used it!
People have good reasons to use spreadsheets, rather than telling them not to use spreadsheets we, as academics, should show how to fix spreadsheets. Or programming languages, for that matter.
The one huge killer feature of spreadsheets is live programming! Spreadsheet are always running, output is immediate and input always has concrete values rather than being abstract variable names.
Telling people to use “bona fide” programming languages instead of spreadsheets is like telling people to read musical scores rather than listening to recorded concert sessions. Nobody but a few highly skilled experts get the same value out of it.
Here’s what we can and should do about it,
— Fix spreadsheets to support best practices like testing, code reviews and version control.
— Fix programming languages to be live, ie to be always running, having immediate output and concrete input available at all times of the programming activity.
Academics telling people to change their behavior has never worked and will never work. People are not stupid, they are fully aware of the shortcomings of spreadsheets but use them because their added value (ie live programming) is just so much bigger. Where we, as academics, can add value to this is by fixing spreadsheets and programming language.
I agree that both spreadsheets and traditional programming tools are badly, badly broken, although for different reasons.
I totally agree with your “liveness” argument — it is a massive feature, and it is something that mainstream software development environments badly need.
I am super excited by the attention that projects like light table are getting, and hope that they spur others onto the same bandwagon.
I also occasionally implement a poor-man’s version of “live coding” when writing python scripts by having my unit tests run every time I save one of my source files.
My main problem with Excel is not the way that people interact with it; I am fine with spreadsheet programming, and would not expect people to have to learn a programming language just to edit a spreadsheet! The main problem that I have is that it is not (out-of-the box) possible to extract the formulae in a spreadsheet into a plain text file, merge it with somebody else’s changes, and import the merged document back into the main spreadsheet.
Now, I admit, the .xlsx format does take some steps to address these concerns, but the format is not particularly transparent, well documented, simple or readable.
As for your proposed fixes, I can do nothing else but support them wholeheartedly. I wonder if we should turn to KickStarter to try to fund an Excel alternative that sits on top of Python? (A highly scriptable spreadsheet implemented in the spirit of Sublime Text)?
Here, here!
As a programmer, I was shocked when I read the Reinhart-Rogoff calculations were done on Excel. As a researcher, however, I was not.
At least tools like Matlab make the algorithms easily verifiable, the idea of writing tests in research is still an anomaly. The research groups that open source their projects do a much better job at this.
IMHO, the biggest problem with Excel is that it is difficult to place logic in separate plain-text source files. By wrapping the logic and the data up into opaque binary .xls files, standard management tools like “diff” and “blame” cannot be used, which makes it extraordinarily and unnecessarily hard to effectively manage and review business critical (not to mention global-economy-critical) logic stored in Excel.
Two nice blog posts on spreadsheet.
But if we want to manipulate cross-referenced data on an intranet, do you have an idea of the best opensource tools or libraries to substitute to Excel usage ?
Thanks for posting this.
IMO, everyone who may be in a job where automation via spreadsheet is likely, needs training in SDE fundamentals: unit testing, the important of open source and open data for reproducibility, version control, and code review. We are all computer scientists now.
I haven’t used Excel in a few years, but it certainly used to have a control (I think it was Cntl-`) to display the formula in each cell instead of the values.
Now, I agree it’s still a nightmare to read and validate pages of cells with crazy formulae in them, but it’s slightly better than clicking into each cell and should help spot discrepancies like “this cell is doing sum(A10:A50) and the one next to it is doing sum(B10:B35). But just a FYI.
Ironically enough I learnt this when supporting some people doing self-directed learning on Excel at a very low level. It really started from “this is how you turn the computer on, this is how you start Excel” in lesson 1 and this was in about lesson 3. I’d been using Excel for keeping accounts and the like for several years and training people to use it for a few by then. And, as you can, memory of the keystroke clearly lingered because I still never used it!
I dare to disagree.
People have good reasons to use spreadsheets, rather than telling them not to use spreadsheets we, as academics, should show how to fix spreadsheets. Or programming languages, for that matter.
The one huge killer feature of spreadsheets is live programming! Spreadsheet are always running, output is immediate and input always has concrete values rather than being abstract variable names.
Telling people to use “bona fide” programming languages instead of spreadsheets is like telling people to read musical scores rather than listening to recorded concert sessions. Nobody but a few highly skilled experts get the same value out of it.
Here’s what we can and should do about it,
— Fix spreadsheets to support best practices like testing, code reviews and version control.
— Fix programming languages to be live, ie to be always running, having immediate output and concrete input available at all times of the programming activity.
Academics telling people to change their behavior has never worked and will never work. People are not stupid, they are fully aware of the shortcomings of spreadsheets but use them because their added value (ie live programming) is just so much bigger. Where we, as academics, can add value to this is by fixing spreadsheets and programming language.
Ok, yes, fair point.
I agree that both spreadsheets and traditional programming tools are badly, badly broken, although for different reasons.
I totally agree with your “liveness” argument — it is a massive feature, and it is something that mainstream software development environments badly need.
I am super excited by the attention that projects like light table are getting, and hope that they spur others onto the same bandwagon.
I also occasionally implement a poor-man’s version of “live coding” when writing python scripts by having my unit tests run every time I save one of my source files.
My main problem with Excel is not the way that people interact with it; I am fine with spreadsheet programming, and would not expect people to have to learn a programming language just to edit a spreadsheet! The main problem that I have is that it is not (out-of-the box) possible to extract the formulae in a spreadsheet into a plain text file, merge it with somebody else’s changes, and import the merged document back into the main spreadsheet.
Now, I admit, the .xlsx format does take some steps to address these concerns, but the format is not particularly transparent, well documented, simple or readable.
As for your proposed fixes, I can do nothing else but support them wholeheartedly. I wonder if we should turn to KickStarter to try to fund an Excel alternative that sits on top of Python? (A highly scriptable spreadsheet implemented in the spirit of Sublime Text)?