How to fix common CSV data formatting issues

how-to-fix-common-csv-data-format-issues

Wrong field delimiters

ISSUE: A CSV file contains data that consists of a bunch of fields separated by a comma and optionally enclosed by double-quotes, hence the name Comma-Separated-Values or CSV. Due to the lack of an actual standard for CSV formatting, some programs may opt to use semi-colons instead of commas as separators.

FIX: Open the CSV file using any text editor like Notepad++ (https://notepad-plus-plus.org/) and just replace semi-colons with commas or vice-versa. Remember to check your file afterwards for literal characters that might have been replaced too. Manually edit them and voila… problem solved!

Double-quote escape characters

ISSUE: As mentioned above, some fields may be optionally enclosed by double-quotes. When enclosed by double-quotes, any literal double-quotes in the actual data will have to be escaped to avoid it being read as one of the enclosing double-quotes.

Now the problem is how the literal double-quotes are escaped. There are 2 accepted ways of escaping double-quotes in a CSV file. One is using a 2 consecutive double-quotes to denote 1 literal double-quote in the data. The alternative is using a backslash and a single double-quote.

Sounds straightforward, right? Wrong!

The big issue here is portability. A CSV file created by one program using the 2 consecutive double-quotes escape characters will not be read properly by another program which uses the backslash and single double-quote. The reverse also will encounter the same issue.

FIX: For this to work, you will need to know what the escape character for your target program is. After that, you can do a simple search and replace using any text editor like Notepad++ where you replace the 2 consecutive double-quotes with the backslash and single double-quote. Reverse the process if you are in the opposite situation.

End-of-line characters

ISSUE: End-of-line characters or EOLs as most folks would call them are special characters that signal the end of a particular line in any text file including a CSV file. To visually see which types are in your CSV file, open the CSV file using Notepad++. Then in the Notepad++ menu, click on View then click on Show Symbol. Select Show All Characters.

Below is an example of a Windows EOL. CR stands for Carriage Return and LF stands for Line Feed.

Windows EOL

Below is an example of an Unix EOL. There is just the LF character.

Unix EOL

FIX: Now if you require your CSV file to have Unix EOL instead of Windows EOL, just click Edit in the Notepad++ menu then click EOL Conversion. Then click on Unix (LF) to convert the whole file to Unix EOL. For the reverse, just click on Windows (CR LF) to convert the other way.

Occasionally you may encounter a mangled CSV file containing both Windows and Unix EOLs. This will usually present a challenge to the program trying to read the file. When you see a file that has mixed EOLs, it is wise to convert the whole file to either Windows or Unix EOLs.

You can just follow the steps above to flip them one way or another.

Reminder for MailboxValidator users

The MailboxValidator bulk upload service is expecting either Windows EOL or Unix EOL. Having both in the same CSV file will result in error or misread values which will lead to incorrect email validation results. Also note that the double-quote escape characters in use by MailboxValidator is a backslash followed by a single double-quote.

If your file does not conform to the above requirements, you will need to follow the steps in this article to sanitize and standardize your CSV file prior to uploading to the MailboxValidator bulk upload service.

Was this article helpful?

Related Articles