Home » Questions » Computers [ Ask a new question ]

Transforming CSV file using sed

Transforming CSV file using sed

In order to be able to import some data into a certain tool, I have to transform a CSV file from this format

Asked by: Guest | Views: 339
Total answers/comments: 2
Guest [Entry]

"( tr , ';' | tr -d '""' ) < input.csv > output.csv

I'd use Perl

perl -pe 'tr/,""/;/d' input.csv > output.csv

-- but this specific task isn't beyond sed. You cannot merge the two expressions."
Guest [Entry]

"Since you're dealing with records, awk makes more sense. That said, it's not really good at CSV, since the field delimiters are somewhat variable. But if you're certain that all fields are surrounded by doublequotes, this will work:

awk -F'"",""' 'BEGIN {OFS="";""} { gsub(/(^"")|(""$)/, """"); $1=$1; print }'

This sets awk's input field separator to """","""" (including the inner set of doublequotes). This almost works, except you have to deal with the leading and trailing doublequotes, which are stripped with the gsub function. The $1=$1 forces it to recompile the record with the new output field separator, which was defined as ; in the BEGIN block. Then print prints out the whole record.

This is a little tidier:

awk -F '(^"")|("","")|(""$)' 'BEGIN {OFS="";""} { $1=$1; print }'

It sets the input field separator to a regular expression that includes the doublequotes at the beginning and end of the record, but it also causes it to print out an empty beginning and trailing field. You can easily get rid of the trailing field:

awk -F '(^"")|("","")|(""$)' 'BEGIN {OFS="";""} { NF=NF-1; $1=$1; print }'

NF is the number of fields, and reducing it by one lops off the last field. But I can't think of a way to chop off the first field.

If you know that the input always has five fields, though, you could do this:

awk -F '(^"")|("","")|(""$)' 'BEGIN {OFS="";""} { print $2,$3,$4,$5,$6 }'

Notice this gets rid of the $1=$1 construct, which we only need if we're printing the (implied) $0.

All that said, I'd probably end up using perl and one of the many available CSV modules on CPAN."