Post #19,951
12/2/01 9:37:16 AM
|
MySQL vs Oracle question
Oracle VS MySql for pure indexed query.
I've got a project that will cost about $80,000 in Oracle licensing, and we would like to avoid that.
Very simple.
2 tables.
Table 1) 50 million records. Name, address, city, state, zip, bit of demographics.
Table 2) 130 million records. Contact information. dates. etc.
Monthly reload of the data.
Table 1 will be queries on the following keys: Almost Unique Id Zip Code Name Address State
Table 2 will be queried on "Almost Unique Id", after the person has figured it out via the Table 1 query.
Queries on Table 1 are expect to return less than 100 records, and typically 1 or 2.
Response time is CRITICAL. 2 second avg, 4 second max. There may be 200 queries going on at once, and I'd like to design to scale much higher if possible. The general user population is about 1000 people, but the activity is regional bursts.
Back end disk will be fibrechannel attached array. Raw data size is about 200GB, which will go to about 120GB in a variable len record database.
There must be 2 systems for safety, so total disk will be about 1 TB.
Now the question: Do I need a Sun/Oracle front end, or can I use Intel Linux/MySQL or Sun/MySQL?
Has anyone had experience comparing the 2 for this type of application?
|
Post #19,954
12/2/01 11:45:06 AM
|
Other things to consider:
Interbase is free, and very good.
I don't know that MySQL or even Postgres are capable of scaling that far. Postgres has been tested as being faster then MySQL in many instances, and it's an all around better database than MySQL.
You might investigate DB/2 as well. It will certainly scale that high, but I don't know what the cost is.
The Linux IO will be much faster than Sun. We have seen this tested. It's actually quite amazing how much faster it is.
Regards,
-scott anderson
|
Post #19,956
12/2/01 12:08:04 PM
|
Re: Other things to consider:
> Interbase is free, and very good.
Interbase is "seen" as going away. Whether true or not, I can't split my attention any further. I've been directed to research MySQL, and if that can't handle it, PostGreSQL. I always felt PostGreSQL was more technically advanced than MySQL, while MySQL had the initial speed advantage. PostGreSQL might have caught up. MySQL is definately simpler.
> I don't know that MySQL or even Postgres are capable of > scaling that far. Postgres has been tested as being faster > then MySQL in many instances, and it's an all around better > database than MySQL.
I've loaded my initial 50MM record table into MySQL already, and am currently indexing the primary key. I'll let you know.
I needed the following in my table create to allow enough data:
TYPE=MyISAM RAID_TYPE=STRIPED RAID_CHUNKS=30 RAID_CHUNKSIZE=10000 MIN_ROWS=60000000 MAX_ROWS=500000000
It took about 4 hours of various trial and error failures.
On non-64 bit systems, the max file size is 4GB, even on Linux 2.4.x, which means you need to enable RAID in MySQL, which is not the default. An when enabled, you then need to give it an estimate of how many rows you will be using, to allow it to correctly allocated the internal table structures.
The binary MySQLs don't have RAID enabled, and to compile it you need gcc 2.95, which was not on my system at the time.
> You might investigate DB/2 as well. It will certainly scale > that high, but I don't know what the cost is.
Big bucks. Might as well use Oracle.
> The Linux IO will be much faster than Sun. We have seen > this tested. It's actually quite amazing how much faster it is.
Examples? I've seen the reverse, but then again, I can aggregate more/simply on my Sun 450s with Veritas. I haven't played with LVM to duplicate the layouts.
|
Post #19,973
12/2/01 3:34:21 PM
|
Interbase "seen" as going away...
It can't. It's an open source project now. It will always be as available as MySQL or Postgres.
As to postgres being more complex, I will most certainly disagree there. It's amazingly simple to set up and easy to use, especially if you are used to Oracle.
The Sun/Linux testing was done by another company we're dealing with, in single machines. The basis was servlet and JMS load testing. The IO coming out of the linux box was much higher.
Regards,
-scott anderson
|
Post #19,993
12/2/01 7:41:07 PM
|
Re: Interbase "seen" as going away...
> It can't. It's an open source project now. It will always be as > available as MySQL or Postgres.
It takes a certain amount of critical mass to ensure the long term viability of any open source project. I certainly won't be bug fixing Interbase, and unless there is a critical reason to go with it as opposed to any other, I'd choose the other.
> As to postgres being more complex, I will most certainly disagree there. > It's amazingly simple to set up and easy to use, especially > if you are used to Oracle.
Just a simple issue of preference, I guess. I've done both. Since I lived in Oracle for many years. I can deal with it's complexity without blinking. I can even explain it to others. On the other hand, stupid simple stuff escaped me with PostGreSQL, and I even have the Elephant book. I might be prejudiced against it, though, after speaking to the marketing slime at Great Bridge. I'm happy to see that tank, simply because they were bullshitting on the benchmarks. I met them at 'something'-XPO in NY last year. When I called them on it, they agreed that the marketing guys pushed a bit.
> The Sun/Linux testing was done by another company we're dealing > with, in single machines. > The basis was servlet and JMS load testing. The IO coming out of > the linux box was much higher.
Highly unlikely. I can consistently pump 180MB per second via 2 FC channel cards, and actually process the data as opposed to spinning on the IO. On the other hand, until very recently, Intel/AMD CPUs were MUCH faster than SPARC. So if you had a CPU intensive process (which includes Java interpretation), you'd bottleneck on the CPU before the IO, but if you were measuring the IO, it would "look" slower. Did you do simultaneous CPU measurements during these tests?
And are you sure the bottleneck wasn't in what was feeding you the data?
Also, since you were 'servlet' testing, were you network bound? When going Sun->Sun, I can move about 75MB (mega BYTE) per second via GB ethernet. When a Linux box gets in the mix (Dual PIII-866 I think) the performance drops to about 30MB per sec, and I see the Linux CPU spinning on the packets.
I'm comparing 3 CPU Sun 450s with Dual CPU Linux boxen here. In most cases, the Linux box is faster on CPU but slower on IO, so I end up buying a central 450 for the IO, and a stack of Linux boxes as compute servers.
I usually always have a few performance windows up when I am working on a box. 'top', 'iostat', 'ddu_watch (shows disk utilization, simple df script). I can 'feel' when my disks start to get overloaded, and I have the same processed running on Intel/Linux and Sparc/Solaris, which means I usually spot bottlenecks very fast.
|
Post #20,145
12/3/01 8:18:29 PM
|
PostGreSQL setup/complexity
I was wrong, or it has gotten a lot better. Took the RPMs, followed the docs, and had the large table loading in about 10 minutes.
|
Post #20,150
12/3/01 8:58:00 PM
|
Yep.
I like Postgres a lot. I know it's not anywhere near what you'll be doing, but we've had nary a skip here on zIWT from the postgres stuff (once I decided to start vacuuming the database every once in a while :-).
Regards,
-scott anderson
|
Post #20,156
12/3/01 9:42:28 PM
|
Re: Yep.
I'm on my 1st vacuum after creating 6 indexes. How long should it take?
|
Post #20,166
12/3/01 11:07:27 PM
|
Shouldn't need one right away.
It's for recalculating index distributions and the like. Kind of a waste right after indexing... :-)
I wouldn't know how long that much data would take. The zIWT tables aren't all that large yet.
Regards,
-scott anderson
|
Post #19,957
12/2/01 12:36:13 PM
|
Wierd create index behaviour
It seemed to have copied the entire table (all files) into another set of files. Now it is doing read-intensive work.
Why? Snapshotting to ensure consistency? Must be bypassable!
|
Post #19,996
12/2/01 8:04:05 PM
|
index create
needs to create a tmp table then slices and dices until complete, then writes the index to disk? maybe? Is informix/sybase more price compat than oracle? informix is faster than oracle last time I checked. thanx, bill
tshirt front "born to die before I get old" thshirt back "fscked another one didnja?"
|
Post #20,002
12/2/01 9:09:04 PM
|
Re: index create
> needs to create a tmp table then slices and dices until complete, then writes > the index to disk? maybe?
It seemed that it created an EXACT copy of the data files that it was supposed to index. It created them in the same dirs that the originals were in. It then ran out of tmp space in /tmp and the index failed. It then deleted all the copy files. Ya know, If I really cared, I could read the source. I don't care that much. Actually, I might, now that I think about it.
I've set the /tmp to be in the large partition, and I've pumped up all the various memory related parameters, I've got 1GB in this box.
> Is informix/sybase more price compat than oracle? informix is faster than > oracle last time I
Nah, Informix of a certain class is faster, and it costs more. I had the Informix people pitching me last month. And the lesser Informix is REAL slow.
Sybase is seen as a corpse that hasn't figured it out yet. I wouldn't get a dime for trying.
No commercial alternatives allowed. If I can't figure it out via MySQL or PostGreSQL, it'll be Oracle.
But I really think that a dual CPU Linux box with MySQL will be plenty fast enough, once I've set it up right. I'm loading the data for the 4th time, after blowing it away trying varous alternatives.
Since it loads in 1/2 hour, it is worth testing variations.
Does anyone know of a Solaris like iostat for Linux. The one in the mpstat package is nasty. I want to see MB/s and Service time, but this one makes me think in blocks or sectors. Of course, I COULD hack it.
|
Post #20,005
12/2/01 9:39:52 PM
|
Well, Sybase still has some money
since they're building a big new building on Hacienda Road in Dublin (just north of I-580, and just south of one of our customers -- and the Santa Rita jail!).
Then again, maybe it's really a plan to save money: rent might be cheaper in Dublin than Emeryville, and they might be moving everybody there.
BTW, commercial rents in this area are still around $2/sq ft/month -- and the sublease signs have been going up (e.g. on one Lam Research bldg).
Good luck with project.
Tony
|
Post #20,011
12/2/01 10:27:47 PM
|
Re: index create
OK, here's the reason. MySQL originally created indexes via 'alter table add index name (field1,field2,...) syntax. "create index ..." is a synonym. And ALL alter tables do the following: -----------------------------------------------------------
ALTER TABLE works by making a temporary copy of the original table. The alteration is performed on the copy, then the original table is deleted and the new one is renamed. This is done in such a way that all updates are automatically redirected to the new table without any failed updates. While ALTER TABLE is executing, the original table is readable by other clients. Updates and writes to the table are stalled until the new table is ready.
Note that if you use any other option to ALTER TABLE than RENAME, MySQL will always create a temporary table, even if the data wouldn't strictly need to be copied (like when you change the name of a column). We plan to fix this in the future, but as one doesn't normally do ALTER TABLE that often this isn't that high on our TODO. ---------------------------------------------------------------------------
Yack shit. This piss poor design might be reason enough not use it. On the other hand, if I can create all indexes on one pass, it might be worth it.
|
Post #20,015
12/2/01 10:53:57 PM
|
1.5 hours to create index
On almost unique id. As opposed to 13 minutes on Sun/Oracle.
Now I'm trying to create 7 indexes simultaneously.
I suspect I will find there is 1.25 hours overhead, and then each index will be 15 minutes. If that is true, and the query times are OK, we have a winner.
Actually, if EACH one takes 1.5 hours, and it runs overnight once a month, and it saves $80,000, we still have a winner.
|
Post #20,101
12/3/01 3:28:05 PM
|
That's why I'd second the Interbase recommendation
I don't think you need to split your research time any further. IMO your free options should be Postgres vs Interbase. MySQL has too many issues like this; cf. wharris' mention of accuracy (below) and the locking issues (although your description of essentially read-only ameliorates that one somewhat). But these are symptomatic of the many design trade-offs that were made all in the name of speed and small footprint. Comparing Oracle and MySQL is, IMO, an apples-and-oranges comparison. IB, on the other hand, is much closer to Oracle functionality and design.
--------------------------------- A stupid despot may constrain his slaves with iron chains; but a true politician binds them even more strongly by the chain of their own ideas;...despair and time eat away the bonds of iron and steel, but they are powerless against the habitual union of ideas, they can only tighten it still more; and on the soft fibres of the brain is founded the unshakable base of the soundest of Empires."
Jacques Servan, 1767
|
Post #20,025
12/3/01 12:47:34 AM
|
Re: MySQL vs Oracle question
I've got a project that will cost about $80,000 in Oracle licensing, and we would like to avoid that. I can appreciate that. Now the question: Do I need a Sun/Oracle front end, or can I use Intel Linux/MySQL or Sun/MySQL? Wrong question. I think MySQL can handle this volume, but the real question is: Is the data vital, does it *need* to be correct, or are you OK with occasional wrong answers? MySQL does not, currently, as I understand it, *guarantee* correctness of data updates, storage, and retrieval. If data is largely static and isn't updated often, MySQL may be OK.
"Beware of bugs in the above code; I have only proved it correct, not tried it." -- Donald Knuth
|
Post #20,057
12/3/01 7:09:48 AM
|
Readonly
100% batch load once a month.
|
Post #20,553
12/6/01 7:41:13 PM
|
Readonly? Hell, then, full speed ahead!
"Beware of bugs in the above code; I have only proved it correct, not tried it." -- Donald Knuth
|
Post #20,090
12/3/01 2:08:57 PM
|
14 hours later, still indexing
So I also installed PgSQL 7.1, and loaded the data. Load took 2 hours rather than 1/2 hour. I'll start indexing to see how bad it is there.
|
Post #20,144
12/3/01 8:15:59 PM
|
20 hours later, still indexing
I've got 1/2 the indexes on the PgSQL done. I'll try to let the MySQL complete, and then compare query speeds.
If the MySQL takes 24 hours on this box, I can double the CPU to it, if the query speeds make it worthwhile.
|
Post #20,165
12/3/01 11:03:46 PM
|
Wheeee - Done!!!
select name_full from ind_1101 where state = 'NJ' and name_last like 'ROOMBE%'
4 rows in set (0.57 sec)
1/2 second for a dual index lookup out of 50 million records.
I'll whip up a bench to simulate a couple hundred variations an exec them same time to make sure it scales.
PostGreSQL isn't done 'vacuuming' yet, so the comparison isn't ready, but it seems MUCH slower, and the indexes ARE analyzed at this point. About 30 seconds for the same query.
MySQL is the keeper. Double the hardware, and the load can be done in a single day. $3,000 in hardware saves $80,000 in Oracle.
I'll also modify the load:
create table create indexes disable indexes load data enable indexes
I suspect that will be faster.
Also, MySQL took 1/2 the space for the tables than PostGreSQL.
|
Post #20,419
12/5/01 8:58:34 PM
|
New design
1) Split the data into 100 tables, access via 1st two characters of zip.
2) Tell MySQL to create indexes during table create. When the 1st load happens, it will do a quick build of indexes. This is a one shot deal. Faster to reload and rebuild, than to create index afterwards.
3) Custom code Perl API to determine correct table and lookup there.
Actual load and index took about 50 minutes. Perl mod took 10 minutes.
Queries run between 5 and 30 queries a second, with occassional hesitations. Dual XEON with 2GB of memory for cache should cure that. I'll find out on Friday. It's a winner.
|
Post #20,532
12/6/01 6:01:45 PM
|
Happy Ending!
Clever idea, too. Must be time to disstribute it to a pro-Linux website or three. Bonus points if you can include how much it would have cost if it had to run on SQL Server :)
On and on and on and on, and on and on and on goes John.
|