Use a flexible file format. My preference is .csv, because it can be read by almost any program. I'll tolerate .xls, but I'm not pleased with .xlsx (not everyone uses Excel!). And please, please, please do not use pdf.
Why is .xls tolerated but not .xlsx?
XLS and XLSX are both Excel formats, but XLS is a binary blob while XLSX is just a zip containing xml files. I'd much rather parse xml files (XLSX) than the binary blob of XLS.
My experience is that open-source implementations like LibreOffice have much better support for the older file formats. docx/xlsx/etc may be a "zip of XML files" but it doesn't really matter if the original data is stored as XML or a blob when it comes to the end user's rendering experience -- it's about how the system can interpret that data. And the interpretations of the new file formats are mostly severely lacking.
Very much this. Sometimes, I don't want (or can't) open OO.org, I just want to be able to get a quick glimpse in Emacs (http://www.emacswiki.org/emacs/UnXls) or mutt of a simple table of data someone sent me, but so far there are no quick CLI tools (that I've found) to do this. It almost feels as if Microsoft said "hmm, all these free software products can open our files, but they're still complaining about open file formats; how can we abuse our monopoly even more and guarantee that some truly open standard doesn't get mandated?" and thus was born docx, xlsx, pptx, etc.
Yes, it gets data into the sheet, but means that you need extra steps to run calculations. There is no option to specify the column as a number during the import phase. So what takes one step in Excel takes a minimum of two in Libre.
I thought you were worried about leading zeros or something.
In calc 4.0 on XP, I can choose from a few options for each column and the default (called "Standard") results in a column that I can use in calculations (if I highlight the cells it shows the sum, in formulas, etc.).
If I specify the column is "Text", leading zeros are retained.
The newest versions of LibreOffice are a lot better at handling xlsx, and it is getting better all the time. I still see some issues, but lately I have had a much better time importing.
XLSX and the other OpenXML formats are generally speaking very easy to query, particularly as Microsoft provide their OpenXML SDK to do it with as well as other 3rd party tools provide similar functionality.
Third party spreadsheet programs (at least used to) have poor support for XLSX. That's from when it was new, I'd think that popular programs have improved, but that the reputation stuck.
I have had reasonable success with using the Apache POI project for the XLSX and XLS. You might want to consider looking at the community edition of Pentaho Data Integration at the 5.0 level and up. It includes POI for XLSX. I use it all the time to deal with random spreadsheets full of data I get from clinical data (hospital reports).
But I hate excel of any type for data. People randomly change formats and give me updated spreadsheets with 100K's of rows. Then seem surprised when their data doesn't link up with other data sources anymore . . .
Why is .xls tolerated but not .xlsx?
XLS and XLSX are both Excel formats, but XLS is a binary blob while XLSX is just a zip containing xml files. I'd much rather parse xml files (XLSX) than the binary blob of XLS.