Sunday, March 25, 2012

Advice badly needed

I have just been given a ridiculous task... There is an existing SQL database that has one table with two fields. The first is an ID column, the second has information that I somehow need to retrieve. This is my dilemma. Here's something like how the ROWS look in this table:
__________________________________________________ ______________
PRACTICE: Practice 1 COMPLETE HISTORY REPORT PAGE:
DATE: AUG 18, 1973 TIME: 9:04:36
NULL
NULL
12-53-48 DOE JOHN BALANCE: .00
123 Street DR
Somewhere ST 12345

DATE DR PROC.. MD DIAG.. DESCRIPTION........... TRN.AMOUNT TR.BALAN
NULL
JOHN DOE
08/12/94 13 73630 1 959.7 FOOT-COMPLETE 3+ VIEWS 23.35 .00
08/23/94 13 *INS. FILED WITH TRAVELERS
08/29/94 13 39 PAYMENT - WORKER'S COMP. -23.35 .00
TOTAL DUE: .00
__________________________________________________ ______________

Each line is a seperate row in the table!?! Also, the lines with the dates, near the bottom, can have different numbers of rows (Sometimes several, sometimes hundreds) I don't know how this data was imported. It appears that I have 122000 accounts but they are spread across 3.5 million rows... I need to devise some way to take this data out.. Can anyone please give some advice??I'm having trouble understanding your explaination. Can you post the assignment as the instructor gave it to you?

-PatP|||Basically I've been asked to create an Intranet ASP.Net reporting feature for this data. I have to be able to split up this information into some sort of normalized relational structure. I need to be able to grab the start of each record (IE the practice Name row) then each row within this collection of rows. All of this data is in one column! there is no structure to it, as of now. I want to be able to seperate each account into other tables to give some structure... does that make sense?|||So you just have this text blob sitting all by its lonesome in a row, as though somebody dumped a chunk of XML in there? No database enforced schema at all?

Bleach!

-PatP|||I wish it was in one row...

Row 1: PRACTICE: Practice 1 COMPLETE HISTORY REPORT PAGE:
Row 2: DATE: AUG 18, 1973 TIME: 9:04:36
Row 3: NULL

See what I mean?! This is one field, one table.|||If at first you don't succeed, apply brute force. Cursor through the monstrosity, and make a structured schema from it. If worst comes to worst, plop the parts into a work table with pure character columns, then slice and dice them as needed.

What you've got now is a fiasco in the making. YOu have to enforce some structure onto it, soon!

-PatP|||Yes, structure has to be made of this. There are no updates being made to it. It is an old archive that someone imported from an Oracle DB from our pre-merger and it's just sitting there, needing connectivity for reporting. I had hoped to be able to do some types of nested queries to extract what I needed, adding them to another set of tables as I went. Mostly what I know about cursors are that I hate them and can't ever seem to get them to work correctly. As for making the structured schema from it.. would you be able to possibly point me towards any documentation on accomplishing that? Your answer seemed to be a bit above my head:)|||can't you use "text to columns" in excel and use <space> as the delimeter

or write a macro to do repeated keystrokes to structure the data as someone suggested previously|||The exact process varies a little bit, depending on what tools you are using to process your data heaps (the unstructured tables).

Hopefully you know how to interpret the heap. In other words there are either a fixed number of rows for each logical group, or there is some kind of "marker" that will show you where one group ends and another group begins. I assume that the sequence of the rows is fixed (they stay in the same order) and significant (that the order matters). Once you understand the structure of your heap, then you can start to make better sense of it.

You basically want to write some kind of loop that will process each group of related pieces of text (rows within your heap), and put them into a table with more meaningful rows and columns (probably one column for each row in the heap, unless some of them are useless and/or missing).

Without understanding exactly what you've got, I can't think of a way to come up with anything like a step-by-step solution. You'll have to find someone that understands the data to help you structure it.

-PatP|||Ok, from trying to work with this via TSQL using cursors, I've found too many potential errors caused by inconsistency of the original data input. I've now created a VB.Net Windows application to handle the data collection/transformation which encapsulates the transformed accounts into a class before attempting to insert into the new normalized relational database via stored procedure.

I have one question (for now) about form refreshing. The form that allows the user to begin the transformation counts the actual accounts to be worked, placing that number into a label. As each record is completed, I cast back to the form, reducing a label for accounts remaining and increasing a label for accounts complete, then refresh the form to reflect the changes. This process occurs rather quickly but seems to make an odd visual effect to the forms groupbox and listbox controls. They seem to shake as the transformation thread is working. Now, this application is really only to be used once, but I would like to know if this visual effect can be prevented when developing future reusable projects.|||Yes, it can, but I don't have an easy reference available. Maybe someone else has code or a URL they can post. If not, remind me next week and I'll find one for you.

-PatP|||Thanks Pat!

No comments:

Post a Comment