Post #294,204
10/4/07 6:20:01 PM
|
Hey Greg! Database choice time
Here's your chance to sway me again.
Let's say I had around 100GB of raw data I wanted to load and query. About 550 fields, 2K records, tight data, few spaces.
There will be little updating (if any), and all updates will be controlled by me.
Initial usage will be very light, ie: controlled demo. But after that we might see 10 concurrent active users at any given moment. After that point, I'll be open to throw more hardware.
Almost all usage is straight selects to produce counts. I don't even need to think about client connectivity because it'll be a silly SQL script file disk drop and my process will execute, and then return a count in another file.
Initial environment is quad CPU vmware instance, 4GB or memory (I can give it more, I think), 2GHz CPUs, single SAS disk array.
The choice of database is all mine, as long as it's free. If we have to move to commercial, we will, (maybe).
So, I've got 2 weeks to slice and dice the data, load and test query.
Any particular databases / options I should be persuing?
|
Post #294,210
10/4/07 9:03:58 PM
|
PICK? /me dux
Quantum materiae materietur marmota monax si marmota monax materiam possit materiari? Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free american and do not reflect the opinions of any person or company that I have had professional relations with in the past 51 years. meep
reach me at [link|mailto:bill.oxley@cox.net|mailto:bill.oxley@cox.net]
|
Post #294,220
10/4/07 9:47:10 PM
|
You can't duck low enough. </CLUB>
|
Post #294,239
10/5/07 2:22:09 AM
|
Firebird?
[link|http://www.firebirdsql.org/|http://www.firebirdsql.org/]
[link|mailto:MyUserId@MyISP.CountryCode|Christian R. Conrad] (I live in Finland, and my e-mail in-box is at the Saunalahti company.)
Ah, the Germans: Masters of Convoluted Simplification. — [link|http://www.thetruthaboutcars.com/?p=1603|Jehovah]
|
Post #294,241
10/5/07 2:46:14 AM
10/5/07 2:51:03 AM
|
Well, I've got a new religion lately...
When it comes to good code and fast... MySQL. Try the pre v6 stuff first, its pure gold. Circular Replication (Multi-Master) is a good thing, its not DONE, but it is exceptionally usable. MySQL Replication (v5.0/5.1) forces binlog logging. This is GREAT stuff. One thing you can do is, force sync on a slave, take it down and back it up "as static data". This means you have a PERFECT snapshot. Then once done backing up... bring the slave back and the master it is attached to feeds the binlog to the slave and it catches up auto-magically. Also, Don't skip INNODB, use it. It (fully) passes ACID since v4.1.1. The new MySQL Proxy has the ability to send requests anywhere based on rules. Things like this: - If its a write send to a read-write circular replication host
- If it has a Cartesian join in it, send it to this POWER HOUSE slave
- If its a near straight data dump, goto this FAST DISK IO machine
- If you have 200 connections coming from this host right now, send the requests to ALL machines
- based on specific SQL options, Query types, keywords or even SQL query format... (how awesome is that?)
To name a few things, its wildly awesome. Just to give you an IDEA. I have 4 MySQL dedicated machines setup in Cluster mode (multi-master non-circular replication) in two pairs. Plus I have 3 slaves off one cluster for reporting and conversion from legacy system data. the 4 machines are 2 processor Quad Core Intel Xeon 2.66GHz processor with 8GB of memory, mirrored SATAII drives. These machine have been recently support as many as 800 concurrent users, with ease. Some of our application servers (ssh into these machine and run our software) that connect to our MySQL servers, regularly see load averages of 50+ and live to tell about it... A bit pokey at times but over all waiting is the hardest part. In comparison to our old DB (a B-Tree based product... UGH) on same hardware, we get more saturation or users per boxen. Now if we could fix the 17+ group RPC problems with NFSv3... then we'd be able to get even more users/clients on these machines. And to be honest, I came to my new place a PG bigot... its truly amazing at how robust and POWERFUL MySQL is now. It implements just about most things you could REALLY ask for. Triggers, Events, Stored Procs (well mostly anyway) and it does relational, enforcement, atomic updates (with v6), and a plethora of other things. So, in my opinion, humble or not, use MySQL v.latest.(stable or devel) or use pre v6 stuff, I've been putting the .devel stuff in good use. (Not my choice, the Company President is visionary and is actively USING features). It very very very scalable and tweakage is near infinite and transparent. Especially in clustering modes.
-- [link|mailto:greg@gregfolkert.net|greg], [link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey PGP key: 1024D/B524687C 2003-08-05 Fingerprint: E1D3 E3D7 5850 957E FED0 2B3A ED66 6971 B524 687C Alternate Fingerprint: 09F9 1102 9D74 E35B D841 56C5 6356 88C0 Alternate Fingerprint: 455F E104 22CA 29C4 933F 9505 2B79 2AB2
Edited by folkert
Oct. 5, 2007, 02:51:03 AM EDT
|
Post #294,244
10/5/07 2:56:36 AM
|
Hmm, are they repeating Oracle's error: TOO MUCH "tweakage"?
|
Post #294,248
10/5/07 7:02:13 AM
|
:-O
I was tempted to recommend MySQL based on the fact that what crazy described is actually quite a good fit for MyISAM's performance characteristics, as far as I understand them. And that I look after a ~100Gb MySQL replicated database.
One thing I've discovered about the replication is that quite happily, it is at SQL level. This means problem solving when it falls apart (which it does do from time to time) it is because of an SQL-level problem which only takes SQL skills to fix. Unfortunately, it is at the SQL level, which means replication basically re-parses and re-executes the update queries. Which also means that SQL errors can stop replication. But on the gripping hand, you can do wicked stuff like having different storage engines as an extreme case of different configurations...
Wade.
Is it enough to love Is it enough to breathe Somebody rip my heart out And leave me here to bleed
| | Is it enough to die Somebody save my life I'd rather be Anything but Ordinary Please
|
-- "Anything but Ordinary" by Avril Lavigne. | · my · · [link|http://staticsan.livejournal.com/|blog] · · [link|http://yceran.org/|website] · |
|
Post #294,298
10/6/07 12:01:27 AM
|
Oh and here is the MySQL Proxy page
[link|http://forge.mysql.com/wiki/MySQL_Proxy|http://forge.mysql.com/wiki/MySQL_Proxy]
It really is pretty cool.
-- [link|mailto:greg@gregfolkert.net|greg], [link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey PGP key: 1024D/B524687C 2003-08-05 Fingerprint: E1D3 E3D7 5850 957E FED0 2B3A ED66 6971 B524 687C Alternate Fingerprint: 09F9 1102 9D74 E35B D841 56C5 6356 88C0 Alternate Fingerprint: 455F E104 22CA 29C4 933F 9505 2B79 2AB2
|
Post #294,434
10/8/07 4:40:50 PM
|
Ok, playing with 5.0
160MM tiny records loaded in 4.5 minutes, but the index creation is taking FO-EVA!
Any tweaks you recommend to speed it up?
|
Post #294,439
10/8/07 11:09:32 PM
|
Some things are still slow in MySQL.
It's quicker to let it build the indices during import than to generate them later. Mostly this is because altering a table - including changing indices - involves copying the table. This is what takes time. This is why if you have a lot of changes to make to a table's structure, it is far and away fastest to do it in a single ALTER TABLE statement.
Wade.
Is it enough to love Is it enough to breathe Somebody rip my heart out And leave me here to bleed
| | Is it enough to die Somebody save my life I'd rather be Anything but Ordinary Please
|
-- "Anything but Ordinary" by Avril Lavigne. | · my · · [link|http://staticsan.livejournal.com/|blog] · · [link|http://yceran.org/|website] · |
|
Post #294,442
10/8/07 11:59:22 PM
|
Oh yeah - I forgot about that
I was wondering why I saw the temp table that seemed to mirror the data.
That's OK, I'll be rebuilding a few times as I test out various schema definitions.
That particular index build was hung when I ran out of disk in the logging partition. I don't need logs, so I turned that off.
It seemed to have taken 3 hours to build when I restarted. I've blindly created varchar field and need to turn a bunch into real numbers. Should get a bit quicker then as well.
|
Post #294,444
10/9/07 3:28:52 AM
|
I saw an Oracle guy do a Bit Of Learning...
He was very idealistic, tended to believe in The Power Of The Database kind of stuff. Didn't understand that we needed to look after some scaling concerns outside of MySQL because MySQL couldn't do them. Liked references and stored procedures, you probably get the idea.
And got caught by the same re-indexing trap you fell in. :-) It may indeed be faster in Oracle to enable the indexing after you load the table, but this is not so in MySQL.
Wade.
Is it enough to love Is it enough to breathe Somebody rip my heart out And leave me here to bleed
| | Is it enough to die Somebody save my life I'd rather be Anything but Ordinary Please
|
-- "Anything but Ordinary" by Avril Lavigne. | · my · · [link|http://staticsan.livejournal.com/|blog] · · [link|http://yceran.org/|website] · |
|
Post #294,452
10/9/07 9:20:16 AM
|
You teased me
I thought you were about to show me the magic of MySql Bitmapped indexes.
|
Post #294,485
10/9/07 4:58:50 PM
|
On the other hand, maybe those MySQL people
might be learning and copying from those Oracle people.
[link|http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html|http://dev.mysql.com...insert-speed.html]
With some extra work, it is possible to make LOAD DATA INFILE run even faster for a MyISAM table when the table has many indexes. Use the following procedure:
1. Optionally create the table with CREATE TABLE. 2. Execute a FLUSH TABLES statement or a mysqladmin flush-tables command. 3. Use myisamchk --keys-used=0 -rq /path/to/db/tbl_name. This removes all use of indexes for the table. 4. Insert data into the table with LOAD DATA INFILE. This does not update any indexes and therefore is very fast. 5. If you intend only to read from the table in the future, use myisampack to compress it. See Section 13.1.3.3, “Compressed Table Characteristics”. 6. Re-create the indexes with myisamchk -rq /path/to/db/tbl_name. This creates the index tree in memory before writing it to disk, which is much faster that updating the index during LOAD DATA INFILE because it avoids lots of disk seeks. The resulting index tree is also perfectly balanced. 7. Execute a FLUSH TABLES statement or a mysqladmin flush-tables command.
|
Post #294,496
10/9/07 9:52:48 PM
|
OT have you looked at hadoop?
[link|http://lucene.apache.org/hadoop/|http://lucene.apache.org/hadoop/] looks like very interesting tech for write once read lots high volume applications. A database that only contained a path to the information would be fairly quick. thanx, bill
Quantum materiae materietur marmota monax si marmota monax materiam possit materiari? Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free american and do not reflect the opinions of any person or company that I have had professional relations with in the past 51 years. meep
reach me at [link|mailto:bill.oxley@cox.net|mailto:bill.oxley@cox.net]
|
Post #294,499
10/9/07 10:35:25 PM
|
Nope
My type of queries are counts across a hundred million records. Hadoop would bounce that around a few machines.
|
Post #294,515
10/10/07 8:37:47 AM
|
/me rubs eyes
did I just read "hadpoop"?
Smile, Amy
|
Post #294,530
10/10/07 4:48:24 PM
|
You might have read it, but it's not what it said.
|
Post #294,532
10/10/07 5:15:03 PM
|
I'm aware of that now
but this morning I was getting ready to leave for a colonoscopy/endoscopy. I think it affected my eyes ;-)
Smile, Amy
|
Post #294,534
10/10/07 5:26:34 PM
|
Ah. Forgot to get your head out, hadya?
;-)
Hope you had fun. Heh. ;-)
[link|mailto:MyUserId@MyISP.CountryCode|Christian R. Conrad] (I live in Finland, and my e-mail in-box is at the Saunalahti company.)
Ah, the Germans: Masters of Convoluted Simplification. — [link|http://www.thetruthaboutcars.com/?p=1603|Jehovah]
|
Post #294,498
10/9/07 10:21:48 PM
|
Interesting.
It's a bit of extra work, though. And wouldn't be useful for an InnoDB table.
I've seen a few areas where MySQL programmers need to spend some serious time at the codebase. Re-indexing is one of them, although I suspect a lot of the issues are to do with the way it handles table alterations, which are likely due to the engine system. (ALTER TABLE is not transactionable in any engine, and blocks all other activity until it completes.)
Wade.
Is it enough to love Is it enough to breathe Somebody rip my heart out And leave me here to bleed
| | Is it enough to die Somebody save my life I'd rather be Anything but Ordinary Please
|
-- "Anything but Ordinary" by Avril Lavigne. | · my · · [link|http://staticsan.livejournal.com/|blog] · · [link|http://yceran.org/|website] · |
|
Post #294,500
10/9/07 10:38:18 PM
|
"It's a bit of extra work" - not that much
My style of usage is monthly builds with extensive queries in between.
A simply script can handle it all.
|
Post #294,509
10/10/07 7:22:01 AM
|
Ah. I see. Wouldn't suit my use. :-)
Is it enough to love Is it enough to breathe Somebody rip my heart out And leave me here to bleed
| | Is it enough to die Somebody save my life I'd rather be Anything but Ordinary Please
|
-- "Anything but Ordinary" by Avril Lavigne. | · my · · [link|http://staticsan.livejournal.com/|blog] · · [link|http://yceran.org/|website] · |
|
Post #294,520
10/10/07 12:56:07 PM
|
You play with partitioning yet?
I want to partition on state.
I thought I'd use lists, but if there is a miss on an invalid state, it throws the record away.
I was trying to do it on the hash(ascii(state)) to at list break it down by 1st char, but it refused to let me use that.
Any ideas?
|
Post #294,522
10/10/07 1:14:55 PM
|
Nevermind - needed key()
|
Post #294,524
10/10/07 1:37:10 PM
|
Unusable
Load spread drops to about 1/10th of a regular table.
I'll put this one aside for a while.
|
Post #294,268
10/5/07 9:51:00 AM
|
Rare updates, mostly reads?
What about Berkeley?
Alternatively just do a bake-off.
Regards,
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
|
Post #294,291
10/5/07 8:28:30 PM
|
Still need SQL access
But yeah, I'll be loading and baking for the next couple of weeks.
|
Post #294,306
10/6/07 11:35:12 AM
|
there is a sql like addon for berkley I dont care for it tho
Quantum materiae materietur marmota monax si marmota monax materiam possit materiari? Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free american and do not reflect the opinions of any person or company that I have had professional relations with in the past 51 years. meep
reach me at [link|mailto:bill.oxley@cox.net|mailto:bill.oxley@cox.net]
|
Post #294,313
10/6/07 1:46:11 PM
|
"Guide to Oracle Berkeley DB for SQL Developers":
By Margo Seltzer, [link|http://www.oracle.com/technology/pub/articles/seltzer-berkeleydb-sql.html|Guide to Oracle Berkeley DB for SQL Developers] at www.oracle.com.
(Heh; I'd forgot w^Hthey bought that outfit, wossname [something like "SmileyCat"?], too.)
[link|mailto:MyUserId@MyISP.CountryCode|Christian R. Conrad] (I live in Finland, and my e-mail in-box is at the Saunalahti company.)
Ah, the Germans: Masters of Convoluted Simplification. — [link|http://www.thetruthaboutcars.com/?p=1603|Jehovah]
|
Post #294,321
10/6/07 3:18:43 PM
|
I'm sorry, you must have thought I wote (new thread)
Created as new thread #294320 titled [link|/forums/render/content/show?contentid=294320|I'm sorry, you must have thought I wote]
|
Post #294,813
10/18/07 10:01:39 PM
|
Here we go again
MySQL could not handle simple correlated sub-queries. Let me rephrase that. MySQL SUCKS as handling simple correlated sub-queries. Turned what should be a 3 second query into 3 minutes.
So I tried postgresql. And found it was generally faster in all my particular query cases, and handled the subqueries without problem.
One difference in MySQL's favor was size of data loaded. MySQL is much more efficient, taking far less space on disk. But I don't care about that.
But, on yet another hand, postgresql is STILL too slow.
I can thankfully state that M$ is simply not an option, but anything that runs on Linux (and I can afford, the cheaper the better) can be. Including (drum roll please) SAP-DB, Informix (it went open source, right? Hmm, gotta start googling) and pretty much anything else I can come up with a Perl interface to talk to.
Oh, how I miss Oracle. But even if I spend a couple of months working around not having Oracle, it'll still be FAR cheaper than paying for Oracle.
|
Post #294,815
10/18/07 10:49:57 PM
|
Did you use what version of MySQL.
I told you to use 5.1.
There are specific enhancements in MySQL in 5.1
And exactly what kinds of setting were you using?
If you have lotsa memory... let it use it.
If you have drives that blaze... tweak it stop batching/buffering, if you have both... tell it to use more.
And if you didn;t try 5.1, you missed much.
-- [link|mailto:greg@gregfolkert.net|greg], [link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey PGP key: 1024D/B524687C 2003-08-05 Fingerprint: E1D3 E3D7 5850 957E FED0 2B3A ED66 6971 B524 687C Alternate Fingerprint: 09F9 1102 9D74 E35B D841 56C5 6356 88C0 Alternate Fingerprint: 455F E104 22CA 29C4 933F 9505 2B79 2AB2
|
Post #294,817
10/18/07 11:24:35 PM
10/18/07 11:29:40 PM
|
No you didn't
You said:
Try the pre v6 stuff first, its pure gold.
and
use MySQL v.latest.(stable or devel) or use pre v6 stuff
I did V5.0 1st. When I tried 5.1, I got into some lib dependancy issues. It's certainly not in stable, which is a bit scary.
If you are telling me there is significant speeds on query, then I'll try again. But if you are referring to all the new whizbang features (and yes, there are LOTS) that I don't care about, then I'll pass.
Edited by crazy
Oct. 18, 2007, 11:29:40 PM EDT
|
Post #294,818
10/18/07 11:35:25 PM
|
This is the 5.1 list of enchancements
[link|http://dev.mysql.com/doc/refman/5.1/en/mysql-nutshell.html|http://dev.mysql.com...sql-nutshell.html]
The only thing referenced is the partitioning and that doesn't help the initial poor SQL optimization for the straight selects of smallish tables.
I think I'm expecting too much since I have my Oracle experience.
|
Post #294,820
10/19/07 12:01:14 AM
|
MySQL has lots of ah incompleteness in it's optimizer.
It's mostly recent stuff, so they clearly have more work to do there.
Sub-selects is one area like that. I remember my web developers practically drooling over sub-selects when we finally upgraded the production databases. Turns out they are way too slow for many things. If you can restructure it as an Inner Join (even with a temporary table), it will be a lot faster. Like hundreds of times faster.
Wade.
Is it enough to love Is it enough to breathe Somebody rip my heart out And leave me here to bleed
| | Is it enough to die Somebody save my life I'd rather be Anything but Ordinary Please
|
-- "Anything but Ordinary" by Avril Lavigne. | · my · · [link|http://staticsan.livejournal.com/|blog] · · [link|http://yceran.org/|website] · |
|
Post #294,822
10/19/07 12:25:56 AM
|
I have a tough time restructuring
We are creating a generic table driven GUI (Web 2.0 Goodness) front end. The tables can vary tremendously on the backend. So multi stage selects with temp table are right out, since we would have define some type of templating language to drive the query creation, which I want to avoid.
|
Post #294,844
10/19/07 9:19:44 AM
|
Fair enough.
Is it enough to love Is it enough to breathe Somebody rip my heart out And leave me here to bleed
| | Is it enough to die Somebody save my life I'd rather be Anything but Ordinary Please
|
-- "Anything but Ordinary" by Avril Lavigne. | · my · · [link|http://staticsan.livejournal.com/|blog] · · [link|http://yceran.org/|website] · |
|
Post #294,830
10/19/07 6:36:52 AM
|
Sigh... Didn't look too deeply into Firebird, didja?
[link|http://www.janus-software.com/fb_fyracle.html|Fyracle: Oracle-mode Firebird].
There, that's the last I'll care about your problems, you fucking ingrate.
[link|mailto:MyUserId@MyISP.CountryCode|Christian R. Conrad] (I live in Finland, and my e-mail in-box is at the Saunalahti company.)
Ah, the Germans: Masters of Convoluted Simplification. — [link|http://www.thetruthaboutcars.com/?p=1603|Jehovah]
|
Post #294,832
10/19/07 7:50:50 AM
|
Sigh - This time it's an option
So yes, I'll be happy to. Thanks for mentioning it.
|
Post #294,866
10/19/07 6:05:16 PM
|
Ehh, you never CARED!
If you cared, you'd tell me what the bulk loading tool for Firebird is. I don't see one in the docs or the utilities that I have.
|