什么是CSV CSV廣泛用于不同體系結(jié)構(gòu)的應(yīng)用程序之間交換數(shù)據(jù)表格信息,解決不兼容數(shù)據(jù)格式的互通問題,一般按照傳輸雙方既定標(biāo)準(zhǔn)進(jìn)行格式定義,而其本身并無明確格式標(biāo)準(zhǔn)。 CSV用逗號分隔字段的基本思想是清楚的,但是當(dāng)字段數(shù)據(jù)也可能包含逗號或者甚至嵌入換行符時(shí),該想法變得復(fù)雜。 CSV實(shí)現(xiàn)可能無法處理這些字段數(shù)據(jù),或者可能會使用引號來包圍字段。引用并不能解決所有問題:有些字段可能需要嵌入引號,因此CSV實(shí)現(xiàn)可能包含轉(zhuǎn)義字符或轉(zhuǎn)義序列。 RFC 4180提出了MIME類型(”text/csv”)對于CSV格式的標(biāo)準(zhǔn),可以作為一般使用的常用定義,滿足大多數(shù)實(shí)現(xiàn)似乎遵循的格式。 CSV的格式規(guī)范 1. 每一行記錄位于一個(gè)單獨(dú)的行上,用回車換行符CRLF(也就是\r\n)分割。 Each record is located on a separate line, delimited by a line break (CRLF). For example: aaa,bbb,ccc CRLF The last record in the file may or may not have an ending line break. For example: aaa,bbb,ccc CRLF There maybe an optional header line appearing as the first line of the file with the same format as normal record lines. This header will contain names corresponding to the fields in the file and should contain the same number of fields as the records in the rest of the file (the presence or absence of the header line should be indicated via the optional “header” parameter of this MIME type). For example: field_name,field_name,field_name CRLF Within the header and each record, there may be one or more fields, separated by commas. Each line should contain the same number of fields throughout the file. Spaces are considered part of a field and should not be ignored. The last field in the record must not be followed by a comma. For example: aaa,bbb,ccc Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields. For example: "aaa","bbb","ccc" CRLF Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. For example:(下面原文的例子可能有些問題) "aaa","b CRLF If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example: "aaa","b""bb","ccc" 純文本,使用某個(gè)字符集,比如ASCII、Unicode、EBCDIC或GB2312; 正如CSV并不明確的格式,CSV文件的解析同樣沒有標(biāo)準(zhǔn)方法,一般可以自己實(shí)現(xiàn)讀寫,網(wǎng)上也有很多種不同語言的實(shí)現(xiàn)版本。例如opencsv、csvreader等。它們可能會與RFC中的規(guī)定有所出入,例如在csvreader中有要求: 前綴和后綴的空格字符,逗號和制表符,與逗號或記錄分隔符相鄰的內(nèi)容將被修剪。 使用時(shí)需要注意。 |
|