I am sure many of you had to deal with similar situations
Lets say, you were asked to do a system that collects data and generate data. In summary a new database based system for a middle sized company.
Now the company already have other systems one for the sales department, one for customer support, one for finance etc ...
Part of the data the new system needs (the generated data) depend on information that already exist in the old systems, actually some of them exist in each system, for example (the supliers table). And some exist in just one of the 3 systems (invoices only exist in the finance db).
Now another responsibility of the system is to make easier to create repports that depends on data, in all 3 system knowing that some are duplicated and some are required for the new functionality.
Now the question is, how to best integrate or merge the data from the 3 databases, knowing, that they dont all use the the dbms engine, some use oracle others use sqlserver. The new system must uses sql server (licensing issue).
Should the new functionality be built independently (i.e. have its own talbes of everything, vendors, customers etc ...)
And later find a way to build the reports that integrate all systems. And what would those tools be (BusinessOBjects XI, or SAP datawarehousing package, or what, is there something opensource for this)
Or should I find a way to integrate the old databases first, and build the new functiontinality on the output of the integration. And in this case how this integration be implementated.
Should I create a db that have views linked to those table of interest (I am not sure such a thing exist, but if access have dblink I think sql server should have something like it)
Or what else, what technologies are usually used. A problem I can't seem to fully graso is that the integration result much always be a correct mirror to the original data. So the view concept seem like a good idea, or are there hiddens flows in this
What other issues ppl forget about in such situation, for example, how to deal with the situation where the sales vendor list have more vendors than the finance vendors list. Which list should the taken.
What about the mapping, for exmaple the vendor code can be different in the finance and sales, plus the name can be spelled differenly. I am thinking of table that map codes something like
vendor_code_in_sales | vendor_code_in_finance | vendor_code_in_*
and enter those records in this talbe manually of course. But are there other ways.
If there are books about this, please recommend them, I saw this book:
[link|http://www.amazon.com/gp/product/0201844567/ref=sib_rdr_dp/103-3575556-1641469?%5Fencoding=UTF8&me=ATVPDKIKX0DER&no=283155&st=books&n=283155|Next Generation Application Integration: From Simple Information to Web Services
by David S. Linthicum], in the library near me, but its not really cheap, and I am not sure of its usefulness, do you think it can help me or do you recommend other books, that better deal with the pitfalls of integration, and not just explain the different technologies, or even better explain how the different technologies handle those pitfall. This book seem to do a good job demonstrating the availbe technologies, I for example, have no clue how xml is used in integration. But on the oppositeand I am more interested in datawarehousing like style of integration, and tha talbe of content he doesnt seem to cover this
In summary, the input (of the integration) is db tables (different engines), I imagine the output should be db tables too (in sql server), not some other vague concept, or any type of file or a report (a report is not a talbe).
Waiting for your advices