IWETHEY v. 0.3.0 | TODO
1,095 registered users | 1 active user | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New General DB integration question
I am sure many of you had to deal with similar situations

Lets say, you were asked to do a system that collects data and generate data. In summary a new database based system for a middle sized company.

Now the company already have other systems one for the sales department, one for customer support, one for finance etc ...

Part of the data the new system needs (the generated data) depend on information that already exist in the old systems, actually some of them exist in each system, for example (the supliers table). And some exist in just one of the 3 systems (invoices only exist in the finance db).

Now another responsibility of the system is to make easier to create repports that depends on data, in all 3 system knowing that some are duplicated and some are required for the new functionality.

Now the question is, how to best integrate or merge the data from the 3 databases, knowing, that they dont all use the the dbms engine, some use oracle others use sqlserver. The new system must uses sql server (licensing issue).

Should the new functionality be built independently (i.e. have its own talbes of everything, vendors, customers etc ...)
And later find a way to build the reports that integrate all systems. And what would those tools be (BusinessOBjects XI, or SAP datawarehousing package, or what, is there something opensource for this)

Or should I find a way to integrate the old databases first, and build the new functiontinality on the output of the integration. And in this case how this integration be implementated.

Should I create a db that have views linked to those table of interest (I am not sure such a thing exist, but if access have dblink I think sql server should have something like it)

Or what else, what technologies are usually used. A problem I can't seem to fully graso is that the integration result much always be a correct mirror to the original data. So the view concept seem like a good idea, or are there hiddens flows in this

What other issues ppl forget about in such situation, for example, how to deal with the situation where the sales vendor list have more vendors than the finance vendors list. Which list should the taken.

What about the mapping, for exmaple the vendor code can be different in the finance and sales, plus the name can be spelled differenly. I am thinking of table that map codes something like

vendor_code_in_sales | vendor_code_in_finance | vendor_code_in_*

and enter those records in this talbe manually of course. But are there other ways.

If there are books about this, please recommend them, I saw this book:
[link|http://www.amazon.com/gp/product/0201844567/ref=sib_rdr_dp/103-3575556-1641469?%5Fencoding=UTF8&me=ATVPDKIKX0DER&no=283155&st=books&n=283155|Next Generation Application Integration: From Simple Information to Web Services
by David S. Linthicum], in the library near me, but its not really cheap, and I am not sure of its usefulness, do you think it can help me or do you recommend other books, that better deal with the pitfalls of integration, and not just explain the different technologies, or even better explain how the different technologies handle those pitfall. This book seem to do a good job demonstrating the availbe technologies, I for example, have no clue how xml is used in integration. But on the oppositeand I am more interested in datawarehousing like style of integration, and tha talbe of content he doesnt seem to cover this

In summary, the input (of the integration) is db tables (different engines), I imagine the output should be db tables too (in sql server), not some other vague concept, or any type of file or a report (a report is not a talbe).

Waiting for your advices
New First step is to realize this is not a technology project
You have multiple similar representations of the same real-world entities. Those representations are similar, but not identical. Until they are identical, they can not be used to generate unambiguously correct reports. Yes, you can get close, but you can not verify the correctness.

Once the people who matter -- ie: the ones who pay the bills -- are willing to accept that the data must be cleaned up first, then any one of several options for doing the reporting becomes clear. This exact project has been done over and over, and the people who don't have to make it work always think there is some magic wand you can wave over shit data that makes it all fit together. There isn't. You can't.




Now, once you've got the green light to fix the data, you need a way to keep multiple DB platforms in sync. If the only two are Oracle and SQL Server, you can use update triggers on all tables that need to be syncronized. These triggers will have to call routines -- probably written in C/C++ -- to do the synchronization.

I'm not going to suggest picking one authoritative customer table and deleting the other two. That would involve re-writing everything that uses it. That's a whole other project. Unless that is open for discussion.
===

Purveyor of Doc Hope's [link|http://DocHope.com|fresh-baked dog biscuits and pet treats].
[link|http://DocHope.com|http://DocHope.com]
New Double plus good
Having played this game more then once I have to agree with Drew. First, track down the differences in the databases and data and figure out how to reconcile them at a buisness level. You will probably have to send some of those issues back up to your manager to setup cross department meetings to figure out how to reconcile them.

After that the technological problem is actually much simpler. Depending on how much data you need for your reports, how current they need to be, and how often they are generated, you have a bunch of options.

Jay
New Thanks :)
Just wanted to say thank you, for your advices and replies.

I think, I was really hoping, that there would be a magical tool (well not in a naiive way of course), since as drewk mentioned this system was done many times before.

So to sum things up.
1. Make sure all the databases have the same information
2. Use db triggers to synchronize, from that point forward
3. Build new functionality on top of that

Cool, and thanks again :)
New Note well
that making sure the databases have the 'same' information will not be easy. This is a political problem; who gets to decide the info on a customer... finance or sales?

I'd suggest that if you're going to sql server, that you might want to consider changing that a bit:

  • Meet with stakeholders over time to resolve information issues.

  • Create a new schema for a new'n'improved system that includes all data.

  • Harmonise data in the various databases. Create triggers that keep information harmonised following above schema. Run this way for a while to ensure that schema meets organisation's needs. Use this phase to refine schema according to users' needs.

  • Implement new and improved schema on central high powered db. I'd strongly consider postgres (free licensing, support available from developers for money, it runs on Windows too, has ports to many versions of unix so when growth is such that windows keels over you can scale to a better platform with little pain).

  • Move data into central db using above schema for organisation. Migrate clients to software to access this db instead of the group of old databases.

  • Keep both old and new systems synched for a while to ensure that no problems pop up that require reverting to old system

  • Decomission the old system.

Lots of folks here have more experience with this stuff than I do, I'm sure that they'll chime in if I've egregiously misrepresented/fucked up anywhere.
--\n-------------------------------------------------------------------\n* Jack Troughton                            jake at consultron.ca *\n* [link|http://consultron.ca|http://consultron.ca]                   [link|irc://irc.ecomstation.ca|irc://irc.ecomstation.ca] *\n* Kingston Ontario Canada               [link|news://news.consultron.ca|news://news.consultron.ca] *\n-------------------------------------------------------------------
New No... you did very well. One thing to add.
After: Decomission the old system

I'd re-deploy the old systems as report "processing" off-loaders and/or compute nodes.

Obviously they are pretty good machines, currently, seeing they run Oracle and MS-SQL. That way you could get more life out of the machines *AND* make sure the old system(s) go away...

Get rid of the old and augment the new at the same time. That way departments don't feel like they are being robbed, you could even dedicate the old machine to them for specific things, making them feel special.

All part of "Protecting the Company's Investment". Yeah... err... YEAH!
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
Freedom is not FREE.
Yeah, but 10s of Trillions of US Dollars?
SELECT * FROM scog WHERE ethics > 0;

0 rows returned.
New stovepiping is your friend
your new database should have datagathering pipes to the other applications. However Drew is correct, you have a strenuous data cleaning job before you do anything.
thanks,
Bill
Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free american and do not reflect the opinions of any person or company that I have had professional relations with in the past 50 years. meep
     General DB integration question - (systems) - (6)
         First step is to realize this is not a technology project - (drewk) - (4)
             Double plus good - (JayMehaffey)
             Thanks :) - (systems) - (2)
                 Note well - (jake123) - (1)
                     No... you did very well. One thing to add. - (folkert)
         stovepiping is your friend - (boxley)

The Elvish prince was so powerful and legendary that his first name alone contained over twenty apostrophes.
86 ms