Otis Mukinfus wrote:
> On 29 Apr 2006 00:39:38 -0700, "rob" <rmdiv2000@yahoo.com> wrote:
>
> >I am using Odbc to read cvs files. Unfortunately, some cvs files are
> >not formated correctly (out of my control). One particular problem is
> >that a quote within an item is not put in double quotes, i.e. the file
> >says
> >
> >"this "item" is bad", "this item is ok"
> >
> >rather then
> >
> >"this ""item"" is bad", "this item is ok"
> >
> >odbc now thinks ''this '' is the first item rather then ''this "item" is
> >bad''. Excel reads the file just fine, though. Is there some workaround,
> >short of fixing the file myself, to make the driver more error
> >tolerant? If it helps anything bellow is how I read the excel file.
> >
> >Thanks
> >
> >
> >connectionString = @"Driver={Microsoft Text Driver (*.txt;
> >*.csv)};DBQ=" + Path.GetDirectoryName(filename);
> >connection = new OdbcConnection(connectionString);
> >connection.Open();
> >command = new OdbcCommand("Select * FROM " +
> >Path.GetFileName(filename), connection);
> >reader = command.ExecuteReader();
>
> Have you tried reading the files directly, writing code to split the columns out
> with the split command? If the data has the commas in the correct places, the
> double quotes won''t make any difference. If you don''t want the quotes you can
> replace all of them with string.Empty AFTER YOU SPLIT THE LINE.
>
> If you''re trying to load the files into MS SQL Server, create a DTS Package
> (assuming you''re using SQL 2K) and do a bulk import into the table you''re
> filling.
>
> I never use ODBC or any other data provider to read text files, for the very
> reason you are experiencing.
>
> This is cheating, but have you considered reading the file with Excel and saving
> it as a tab delimited or comma delimited file that has no quotes? It won''t be a
> good solution if you are going to have to automate the process ;o)
>
> What are you doing with the file after converting it?
>
> Good luck with your project,
My first approach was using split. The problem that the individual
items often also contain comas so split faild almost always. I tried
regex as well but again due to many special cases that fails as well. I
know that Excel can read the files correctly including all these
special (and maybe even wrong) cases. Hopeing that excel uses odbc
(rather then duplicating some code) I tried that approach. But it does
not seem to work, either.
I guess I might end up writing my own parser. It''s probably not that
big an issue but of course it would be nicer if I had a done solution
that handles all the special cases.
My final goal is to download cvs files from the internet. Actually, I
don''t store them as files but have them right in memory. Then I have to
do some parsing (avoiding duplicates, extract the right data, do some
error checking, etc) on the files and whatever I parsed out goes into
an SQL database.
Thanks