IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New MySQL vs Oracle question
Oracle VS MySql for pure indexed query.

I've got a project that will cost about $80,000
in Oracle licensing, and we would like to avoid that.

Very simple.

2 tables.

Table 1) 50 million records.
Name, address, city, state, zip, bit of demographics.

Table 2) 130 million records.
Contact information. dates. etc.

Monthly reload of the data.

Table 1 will be queries on the following keys:
Almost Unique Id
Zip Code
Name
Address
State


Table 2 will be queried on "Almost Unique Id", after the
person has figured it out via the Table 1 query.

Queries on Table 1 are expect to return less than 100 records,
and typically 1 or 2.


Response time is CRITICAL. 2 second avg, 4 second max.
There may be 200 queries going on at once, and I'd like
to design to scale much higher if possible. The general
user population is about 1000 people, but the activity is
regional bursts.

Back end disk will be fibrechannel attached array. Raw
data size is about 200GB, which will go to about 120GB in
a variable len record database.

There must be 2 systems for safety, so total disk will be
about 1 TB.

Now the question:
Do I need a Sun/Oracle front end, or can I use
Intel Linux/MySQL or Sun/MySQL?

Has anyone had experience comparing the 2 for this type
of application?
New Other things to consider:
Interbase is free, and very good.

I don't know that MySQL or even Postgres are capable of scaling that far. Postgres has been tested as being faster then MySQL in many instances, and it's an all around better database than MySQL.

You might investigate DB/2 as well. It will certainly scale that high, but I don't know what the cost is.

The Linux IO will be much faster than Sun. We have seen this tested. It's actually quite amazing how much faster it is.
Regards,

-scott anderson
New Re: Other things to consider:
> Interbase is free, and very good.

Interbase is "seen" as going away. Whether true
or not, I can't split my attention any further.
I've been directed to research MySQL, and if that can't
handle it, PostGreSQL. I always felt PostGreSQL was more
technically advanced than MySQL, while MySQL had the initial
speed advantage. PostGreSQL might have caught up. MySQL
is definately simpler.

> I don't know that MySQL or even Postgres are capable of
> scaling that far. Postgres has been tested as being faster
> then MySQL in many instances, and it's an all around better
> database than MySQL.

I've loaded my initial 50MM record table into MySQL already,
and am currently indexing the primary key. I'll let you
know.

I needed the following in my table create to allow enough
data:

TYPE=MyISAM
RAID_TYPE=STRIPED
RAID_CHUNKS=30
RAID_CHUNKSIZE=10000
MIN_ROWS=60000000
MAX_ROWS=500000000


It took about 4 hours of various trial and error failures.

On non-64 bit systems, the max file size is 4GB, even on
Linux 2.4.x, which means you need to enable RAID in MySQL,
which is not the default. An when enabled, you then need to give
it an estimate of how many rows you will be using, to allow
it to correctly allocated the internal table structures.

The binary MySQLs don't have RAID enabled, and to compile
it you need gcc 2.95, which was not on my system at the time.

> You might investigate DB/2 as well. It will certainly scale
> that high, but I don't know what the cost is.

Big bucks. Might as well use Oracle.

> The Linux IO will be much faster than Sun. We have seen
> this tested. It's actually quite amazing how much faster it is.

Examples? I've seen the reverse, but then again, I can aggregate
more/simply on my Sun 450s with Veritas. I haven't played with
LVM to duplicate the layouts.
New Interbase "seen" as going away...
It can't. It's an open source project now. It will always be as available as MySQL or Postgres.

As to postgres being more complex, I will most certainly disagree there. It's amazingly simple to set up and easy to use, especially if you are used to Oracle.

The Sun/Linux testing was done by another company we're dealing with, in single machines. The basis was servlet and JMS load testing. The IO coming out of the linux box was much higher.
Regards,

-scott anderson
New Re: Interbase "seen" as going away...
> It can't. It's an open source project now. It will always be as
> available as MySQL or Postgres.

It takes a certain amount of critical mass to ensure the long
term viability of any open source project. I certainly won't
be bug fixing Interbase, and unless there is a critical reason
to go with it as opposed to any other, I'd choose the other.

> As to postgres being more complex, I will most certainly disagree there.
> It's amazingly simple to set up and easy to use, especially
> if you are used to Oracle.

Just a simple issue of preference, I guess. I've done both.
Since I lived in Oracle for many years. I can deal with it's
complexity without blinking. I can even explain it to others.
On the other hand, stupid simple stuff escaped me with
PostGreSQL, and I even have the Elephant book. I might be
prejudiced against it, though, after speaking to the marketing
slime at Great Bridge. I'm happy to see that tank, simply
because they were bullshitting on the benchmarks. I met them
at 'something'-XPO in NY last year. When I called them on
it, they agreed that the marketing guys pushed a bit.


> The Sun/Linux testing was done by another company we're dealing
> with, in single machines.
> The basis was servlet and JMS load testing. The IO coming out of
> the linux box was much higher.

Highly unlikely. I can consistently pump 180MB per second via 2
FC channel cards, and actually process the data as opposed to
spinning on the IO. On the other hand, until very recently, Intel/AMD
CPUs were MUCH faster than SPARC. So if you had a CPU intensive process
(which includes Java interpretation), you'd bottleneck on the CPU
before the IO, but if you were measuring the IO, it would "look"
slower. Did you do simultaneous CPU measurements during these tests?

And are you sure the bottleneck wasn't in what was feeding you
the data?

Also, since you were 'servlet' testing, were you network bound?
When going Sun->Sun, I can move about 75MB (mega BYTE) per
second via GB ethernet. When a Linux box gets in the mix
(Dual PIII-866 I think) the performance drops to about 30MB
per sec, and I see the Linux CPU spinning on the packets.

I'm comparing 3 CPU Sun 450s with Dual CPU Linux boxen here.
In most cases, the Linux box is faster on CPU but slower on IO,
so I end up buying a central 450 for the IO, and a stack of
Linux boxes as compute servers.

I usually always have a few performance windows up when I am
working on a box. 'top', 'iostat', 'ddu_watch (shows disk utilization,
simple df script). I can 'feel' when my disks start to get overloaded,
and I have the same processed running on Intel/Linux and Sparc/Solaris,
which means I usually spot bottlenecks very fast.

New PostGreSQL setup/complexity
I was wrong, or it has gotten a lot better.
Took the RPMs, followed the docs, and had
the large table loading in about 10 minutes.
New Yep.
I like Postgres a lot. I know it's not anywhere near what you'll be doing, but we've had nary a skip here on zIWT from the postgres stuff (once I decided to start vacuuming the database every once in a while :-).
Regards,

-scott anderson
New Re: Yep.
I'm on my 1st vacuum after creating 6 indexes.
How long should it take?
New Shouldn't need one right away.
It's for recalculating index distributions and the like. Kind of a waste right after indexing... :-)

I wouldn't know how long that much data would take. The zIWT tables aren't all that large yet.
Regards,

-scott anderson
New Wierd create index behaviour
It seemed to have copied the entire table
(all files) into another set of files. Now
it is doing read-intensive work.

Why? Snapshotting to ensure consistency?
Must be bypassable!
New index create
needs to create a tmp table then slices and dices until complete, then writes the index to disk? maybe?
Is informix/sybase more price compat than oracle? informix is faster than oracle last time I checked.
thanx,
bill
tshirt front "born to die before I get old"
thshirt back "fscked another one didnja?"
New Re: index create
> needs to create a tmp table then slices and dices until complete, then writes > the index to disk? maybe?

It seemed that it created an EXACT copy of the data files that it
was supposed to index. It created them in the same
dirs that the originals were in. It then ran out of tmp space in /tmp
and the index failed. It then deleted all the copy files. Ya know, If I
really cared, I could read the source. I don't care that much.
Actually, I might, now that I think about it.

I've set the /tmp to be in the large partition, and I've pumped
up all the various memory related parameters, I've got 1GB in
this box.


> Is informix/sybase more price compat than oracle? informix is faster than
> oracle last time I

Nah, Informix of a certain class is faster, and it costs more.
I had the Informix people pitching me last month. And the lesser
Informix is REAL slow.

Sybase is seen as a corpse that hasn't figured it out yet. I
wouldn't get a dime for trying.

No commercial alternatives allowed. If I can't figure it
out via MySQL or PostGreSQL, it'll be Oracle.

But I really think that a dual CPU Linux box with MySQL will be plenty
fast enough, once I've set it up right. I'm loading the data for
the 4th time, after blowing it away trying varous alternatives.

Since it loads in 1/2 hour, it is worth testing variations.

Does anyone know of a Solaris like iostat for Linux. The
one in the mpstat package is nasty. I want to see MB/s
and Service time, but this one makes me think in blocks
or sectors. Of course, I COULD hack it.
New Well, Sybase still has some money
since they're building a big new building on Hacienda Road in Dublin (just north of I-580, and just south of one of our customers -- and the Santa Rita jail!).

Then again, maybe it's really a plan to save money: rent might be cheaper in Dublin than Emeryville, and they might be moving everybody there.

BTW, commercial rents in this area are still around $2/sq ft/month -- and the sublease signs have been going up (e.g. on one Lam Research bldg).

Good luck with project.

Tony
New Re: index create
OK, here's the reason. MySQL originally created indexes
via 'alter table add index name (field1,field2,...) syntax.
"create index ..." is a synonym. And ALL alter tables
do the following:
-----------------------------------------------------------

ALTER TABLE works by making a temporary copy of the original table. The alteration is performed on the copy, then
the original table is deleted and the new one is renamed. This is done in such a way that all updates are automatically
redirected to the new table without any failed updates. While ALTER TABLE is executing, the original table is readable
by other clients. Updates and writes to the table are stalled until the new table is ready.

Note that if you use any other option to ALTER TABLE than RENAME, MySQL will always create a temporary table, even
if the data wouldn't strictly need to be copied (like when you change the name of a column). We plan to fix this in the
future, but as one doesn't normally do ALTER TABLE that often this isn't that high on our TODO.
---------------------------------------------------------------------------

Yack shit. This piss poor design might be reason enough not use it.
On the other hand, if I can create all indexes on one pass, it might
be worth it.
New 1.5 hours to create index
On almost unique id.
As opposed to 13 minutes on Sun/Oracle.

Now I'm trying to create 7 indexes simultaneously.

I suspect I will find there is 1.25 hours overhead,
and then each index will be 15 minutes. If that
is true, and the query times are OK, we have
a winner.

Actually, if EACH one takes 1.5 hours, and it runs
overnight once a month, and it saves $80,000, we
still have a winner.
New That's why I'd second the Interbase recommendation
I don't think you need to split your research time any further. IMO your free options should be Postgres vs Interbase. MySQL has too many issues like this; cf. wharris' mention of accuracy (below) and the locking issues (although your description of essentially read-only ameliorates that one somewhat). But these are symptomatic of the many design trade-offs that were made all in the name of speed and small footprint. Comparing Oracle and MySQL is, IMO, an apples-and-oranges comparison. IB, on the other hand, is much closer to Oracle functionality and design.
---------------------------------
A stupid despot may constrain his slaves with iron chains; but a true politician binds them even more strongly by the chain of their own ideas;...despair and time eat away the bonds of iron and steel, but they are powerless against the habitual union of ideas, they can only tighten it still more; and on the soft fibres of the brain is founded the unshakable base of the soundest of Empires."

Jacques Servan, 1767
New Re: MySQL vs Oracle question
I've got a project that will cost about $80,000
in Oracle licensing, and we would like to avoid that.

I can appreciate that.
Now the question:
Do I need a Sun/Oracle front end, or can I use
Intel Linux/MySQL or Sun/MySQL?

Wrong question. I think MySQL can handle this volume, but the real question is: Is the data vital, does it *need* to be correct, or are you OK with occasional wrong answers?

MySQL does not, currently, as I understand it, *guarantee* correctness of data updates, storage, and retrieval. If data is largely static and isn't updated often, MySQL may be OK.
"Beware of bugs in the above code; I have only proved it correct, not tried it."
-- Donald Knuth
New Readonly
100% batch load once a month.
New Readonly? Hell, then, full speed ahead!
"Beware of bugs in the above code; I have only proved it correct, not tried it."
-- Donald Knuth
New 14 hours later, still indexing
So I also installed PgSQL 7.1, and loaded the data. Load
took 2 hours rather than 1/2 hour. I'll start indexing
to see how bad it is there.
New 20 hours later, still indexing
I've got 1/2 the indexes on the PgSQL done. I'll try
to let the MySQL complete, and then compare query speeds.

If the MySQL takes 24 hours on this box, I can double the
CPU to it, if the query speeds make it worthwhile.
New Wheeee - Done!!!
select name_full from ind_1101
where state = 'NJ' and name_last like 'ROOMBE%'

4 rows in set (0.57 sec)

1/2 second for a dual index lookup out of 50 million records.

I'll whip up a bench to simulate a couple hundred variations
an exec them same time to make sure it scales.

PostGreSQL isn't done 'vacuuming' yet, so the comparison isn't
ready, but it seems MUCH slower, and the indexes ARE analyzed
at this point. About 30 seconds for the same query.

MySQL is the keeper. Double the hardware, and the load can be
done in a single day. $3,000 in hardware saves $80,000 in Oracle.

I'll also modify the load:

create table
create indexes
disable indexes
load data
enable indexes

I suspect that will be faster.

Also, MySQL took 1/2 the space for the tables than PostGreSQL.
New New design
1) Split the data into 100 tables, access via 1st two
characters of zip.

2) Tell MySQL to create indexes during table create.
When the 1st load happens, it will do a quick build of
indexes. This is a one shot deal. Faster to reload and
rebuild, than to create index afterwards.

3) Custom code Perl API to determine correct table and
lookup there.

Actual load and index took about 50 minutes.
Perl mod took 10 minutes.

Queries run between 5 and 30 queries a second,
with occassional hesitations. Dual XEON with 2GB
of memory for cache should cure that. I'll find out
on Friday. It's a winner.

New Happy Ending!
Clever idea, too. Must be time to disstribute it to a pro-Linux website or three. Bonus points if you can include how much it would have cost if it had to run on SQL Server :)
On and on and on and on,
and on and on and on goes John.
     MySQL vs Oracle question - (broomberg) - (23)
         Other things to consider: - (admin) - (14)
             Re: Other things to consider: - (broomberg) - (6)
                 Interbase "seen" as going away... - (admin) - (5)
                     Re: Interbase "seen" as going away... - (broomberg)
                     PostGreSQL setup/complexity - (broomberg) - (3)
                         Yep. - (admin) - (2)
                             Re: Yep. - (broomberg) - (1)
                                 Shouldn't need one right away. - (admin)
             Wierd create index behaviour - (broomberg) - (6)
                 index create - (boxley) - (5)
                     Re: index create - (broomberg) - (4)
                         Well, Sybase still has some money - (tonytib)
                         Re: index create - (broomberg) - (2)
                             1.5 hours to create index - (broomberg)
                             That's why I'd second the Interbase recommendation - (tseliot)
         Re: MySQL vs Oracle question - (wharris2) - (2)
             Readonly - (broomberg) - (1)
                 Readonly? Hell, then, full speed ahead! -NT - (wharris2)
         14 hours later, still indexing - (broomberg) - (2)
             20 hours later, still indexing - (broomberg) - (1)
                 Wheeee - Done!!! - (broomberg)
         New design - (broomberg) - (1)
             Happy Ending! - (Meerkat)

A base libel on toxic waste dumps everywhere.
173 ms