Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've always found the C in CSVs to be needlessly limiting. I'm not aware of any tool that contextualise commas in addresses and other "free text" fields that might form part of the data dump/extract.

For this reason, I am strongly in favour of pipe-separated values in such files. The probability that a data extract has a pipe symbol in any data field is quite small (in my experience, it's been 0 so far).

The first thing I do in any Windows system I have to use on a regular basis is to change the the system separator to '|' in the Regional Settings..



Then you run into ASCII art some end user made, or pre-Unicode text from Scandinavia where 0x7C was the code point of a letter. Commas make it pretty obvious which tools are unusably broken, where very rare characters let these bugs go undetected far too long.


And then you get sent the file:

Statistics on 100000 UNIX shell one liners!


Tab-seperated is also very common, and I find better - how often are you storing text-formatting within cell for a data interchange? Similar to pipes, tabs are much less common than commas.


Yes, pipe-delimited format is great, and much easier to parse than comma-delimited.

Tab-delimited is also better than comma-delimited, I think.

Commas just have far too high a probability of appearing within the data itself, and when you hit a comma that was supposed to be escaped, it adds an extra column to the row, screwing everything up.


Tabs are great, until you try to explain to a non-programmer why your tool doesn't parse their file correctly, even though in the word-processor they're using as a text editor it looks identical to another file that parses correctly.


I also find the C in CSVs to be needlessly limiting. Especially since ASCII already comes defined with characters for record separator (30), unit separator(31), and group separator(29).


Excel and other tools can handle commas if you use quoted fields. Unfortunately this generates a new problem when you need to escape quotes inside a quoted field. There is no universal escape method.


Same for me. CSVs would be far less likely to mess up the data if they were PSVs instead.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: