IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New Hey Greg! Database choice time
Here's your chance to sway me again.

Let's say I had around 100GB of raw data I wanted to load and query. About 550 fields, 2K records, tight data, few spaces.

There will be little updating (if any), and all updates will be controlled by me.

Initial usage will be very light, ie: controlled demo. But after that we might see 10 concurrent active users at any given moment. After that point, I'll be open to throw more hardware.

Almost all usage is straight selects to produce counts. I don't even need to think about client connectivity because it'll be a silly SQL script file disk drop and my process will execute, and then return a count in another file.

Initial environment is quad CPU vmware instance, 4GB or memory (I can give it more, I think), 2GHz CPUs, single SAS disk array.

The choice of database is all mine, as long as it's free. If we have to move to commercial, we will, (maybe).

So, I've got 2 weeks to slice and dice the data, load and test query.

Any particular databases / options I should be persuing?
New PICK? /me dux
Quantum materiae materietur marmota monax si marmota monax materiam possit materiari?
Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free american and do not reflect the opinions of any person or company that I have had professional relations with in the past 51 years. meep

reach me at [link|mailto:bill.oxley@cox.net|mailto:bill.oxley@cox.net]
New You can't duck low enough. </CLUB>
New Firebird?
[link|http://www.firebirdsql.org/|http://www.firebirdsql.org/]


   [link|mailto:MyUserId@MyISP.CountryCode|Christian R. Conrad]
(I live in Finland, and my e-mail in-box is at the Saunalahti company.)
Ah, the Germans: Masters of Convoluted Simplification. — [link|http://www.thetruthaboutcars.com/?p=1603|Jehovah]
New Well, I've got a new religion lately...
When it comes to good code and fast...

MySQL. Try the pre v6 stuff first, its pure gold.

Circular Replication (Multi-Master) is a good thing, its not DONE, but it is exceptionally usable.

MySQL Replication (v5.0/5.1) forces binlog logging. This is GREAT stuff. One thing you can do is, force sync on a slave, take it down and back it up "as static data". This means you have a PERFECT snapshot. Then once done backing up... bring the slave back and the master it is attached to feeds the binlog to the slave and it catches up auto-magically.

Also, Don't skip INNODB, use it. It (fully) passes ACID since v4.1.1.

The new MySQL Proxy has the ability to send requests anywhere based on rules.

Things like this:
  • If its a write send to a read-write circular replication host
  • If it has a Cartesian join in it, send it to this POWER HOUSE slave
  • If its a near straight data dump, goto this FAST DISK IO machine
  • If you have 200 connections coming from this host right now, send the requests to ALL machines
  • based on specific SQL options, Query types, keywords or even SQL query format... (how awesome is that?)
To name a few things, its wildly awesome.

Just to give you an IDEA.

I have 4 MySQL dedicated machines setup in Cluster mode (multi-master non-circular replication) in two pairs. Plus I have 3 slaves off one cluster for reporting and conversion from legacy system data.

the 4 machines are 2 processor Quad Core Intel Xeon 2.66GHz processor with 8GB of memory, mirrored SATAII drives. These machine have been recently support as many as 800 concurrent users, with ease. Some of our application servers (ssh into these machine and run our software) that connect to our MySQL servers, regularly see load averages of 50+ and live to tell about it... A bit pokey at times but over all waiting is the hardest part.

In comparison to our old DB (a B-Tree based product... UGH) on same hardware, we get more saturation or users per boxen.

Now if we could fix the 17+ group RPC problems with NFSv3... then we'd be able to get even more users/clients on these machines.


And to be honest, I came to my new place a PG bigot... its truly amazing at how robust and POWERFUL MySQL is now. It implements just about most things you could REALLY ask for. Triggers, Events, Stored Procs (well mostly anyway) and it does relational, enforcement, atomic updates (with v6), and a plethora of other things.

So, in my opinion, humble or not, use MySQL v.latest.(stable or devel) or use pre v6 stuff, I've been putting the .devel stuff in good use. (Not my choice, the Company President is visionary and is actively USING features). It very very very scalable and tweakage is near infinite and transparent. Especially in clustering modes.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
PGP key: 1024D/B524687C 2003-08-05
Fingerprint: E1D3 E3D7 5850 957E FED0  2B3A ED66 6971 B524 687C
Alternate Fingerprint: 09F9 1102 9D74  E35B D841 56C5 6356 88C0
Alternate Fingerprint: 455F E104 22CA  29C4 933F 9505 2B79 2AB2
Expand Edited by folkert Oct. 5, 2007, 02:51:03 AM EDT
New Hmm, are they repeating Oracle's error: TOO MUCH "tweakage"?
New :-O
I was tempted to recommend MySQL based on the fact that what crazy described is actually quite a good fit for MyISAM's performance characteristics, as far as I understand them. And that I look after a ~100Gb MySQL replicated database.

One thing I've discovered about the replication is that quite happily, it is at SQL level. This means problem solving when it falls apart (which it does do from time to time) it is because of an SQL-level problem which only takes SQL skills to fix. Unfortunately, it is at the SQL level, which means replication basically re-parses and re-executes the update queries. Which also means that SQL errors can stop replication. But on the gripping hand, you can do wicked stuff like having different storage engines as an extreme case of different configurations...

Wade.


Is it enough to love
Is it enough to breathe
Somebody rip my heart out
And leave me here to bleed
 
Is it enough to die
Somebody save my life
I'd rather be Anything but Ordinary
Please



-- "Anything but Ordinary" by Avril Lavigne.

· my ·
· [link|http://staticsan.livejournal.com/|blog] ·
· [link|http://yceran.org/|website] ·

New Oh and here is the MySQL Proxy page
[link|http://forge.mysql.com/wiki/MySQL_Proxy|http://forge.mysql.com/wiki/MySQL_Proxy]

It really is pretty cool.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
PGP key: 1024D/B524687C 2003-08-05
Fingerprint: E1D3 E3D7 5850 957E FED0  2B3A ED66 6971 B524 687C
Alternate Fingerprint: 09F9 1102 9D74  E35B D841 56C5 6356 88C0
Alternate Fingerprint: 455F E104 22CA  29C4 933F 9505 2B79 2AB2
New Ok, playing with 5.0
160MM tiny records loaded in 4.5 minutes, but the index creation is taking FO-EVA!

Any tweaks you recommend to speed it up?
New Some things are still slow in MySQL.
It's quicker to let it build the indices during import than to generate them later. Mostly this is because altering a table - including changing indices - involves copying the table. This is what takes time. This is why if you have a lot of changes to make to a table's structure, it is far and away fastest to do it in a single ALTER TABLE statement.

Wade.


Is it enough to love
Is it enough to breathe
Somebody rip my heart out
And leave me here to bleed
 
Is it enough to die
Somebody save my life
I'd rather be Anything but Ordinary
Please



-- "Anything but Ordinary" by Avril Lavigne.

· my ·
· [link|http://staticsan.livejournal.com/|blog] ·
· [link|http://yceran.org/|website] ·

New Oh yeah - I forgot about that
I was wondering why I saw the temp table that seemed to mirror the data.

That's OK, I'll be rebuilding a few times as I test out various schema definitions.

That particular index build was hung when I ran out of disk in the logging partition. I don't need logs, so I turned that off.

It seemed to have taken 3 hours to build when I restarted. I've blindly created varchar field and need to turn a bunch into real numbers. Should get a bit quicker then as well.
New I saw an Oracle guy do a Bit Of Learning...
He was very idealistic, tended to believe in The Power Of The Database kind of stuff. Didn't understand that we needed to look after some scaling concerns outside of MySQL because MySQL couldn't do them. Liked references and stored procedures, you probably get the idea.

And got caught by the same re-indexing trap you fell in. :-) It may indeed be faster in Oracle to enable the indexing after you load the table, but this is not so in MySQL.

Wade.


Is it enough to love
Is it enough to breathe
Somebody rip my heart out
And leave me here to bleed
 
Is it enough to die
Somebody save my life
I'd rather be Anything but Ordinary
Please



-- "Anything but Ordinary" by Avril Lavigne.

· my ·
· [link|http://staticsan.livejournal.com/|blog] ·
· [link|http://yceran.org/|website] ·

New You teased me
I thought you were about to show me the magic of MySql Bitmapped indexes.
New On the other hand, maybe those MySQL people
might be learning and copying from those Oracle people.

[link|http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html|http://dev.mysql.com...insert-speed.html]

With some extra work, it is possible to make LOAD DATA INFILE run even faster for a MyISAM table when the table has many indexes. Use the following procedure:

1. Optionally create the table with CREATE TABLE.
2. Execute a FLUSH TABLES statement or a mysqladmin flush-tables command.
3. Use myisamchk --keys-used=0 -rq /path/to/db/tbl_name. This removes all use of indexes for the table.
4. Insert data into the table with LOAD DATA INFILE. This does not update any indexes and therefore is very fast.
5. If you intend only to read from the table in the future, use myisampack to compress it. See Section 13.1.3.3, “Compressed Table Characteristics”.
6. Re-create the indexes with myisamchk -rq /path/to/db/tbl_name. This creates the index tree in memory before writing it to disk, which is much faster that updating the index during LOAD DATA INFILE because it avoids lots of disk seeks. The resulting index tree is also perfectly balanced.
7. Execute a FLUSH TABLES statement or a mysqladmin flush-tables command.
New OT have you looked at hadoop?
[link|http://lucene.apache.org/hadoop/|http://lucene.apache.org/hadoop/] looks like very interesting tech for write once read lots high volume applications. A database that only contained a path to the information would be fairly quick.
thanx,
bill
Quantum materiae materietur marmota monax si marmota monax materiam possit materiari?
Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free american and do not reflect the opinions of any person or company that I have had professional relations with in the past 51 years. meep

reach me at [link|mailto:bill.oxley@cox.net|mailto:bill.oxley@cox.net]
New Nope
My type of queries are counts across a hundred million records.
Hadoop would bounce that around a few machines.
New /me rubs eyes
did I just read "hadpoop"?
Smile,
Amy
New You might have read it, but it's not what it said.
New I'm aware of that now
but this morning I was getting ready to leave for a colonoscopy/endoscopy. I think it affected my eyes ;-)
Smile,
Amy
New Ah. Forgot to get your head out, hadya?
;-)

Hope you had fun. Heh. ;-)


   [link|mailto:MyUserId@MyISP.CountryCode|Christian R. Conrad]
(I live in Finland, and my e-mail in-box is at the Saunalahti company.)
Ah, the Germans: Masters of Convoluted Simplification. — [link|http://www.thetruthaboutcars.com/?p=1603|Jehovah]
New Interesting.
It's a bit of extra work, though. And wouldn't be useful for an InnoDB table.

I've seen a few areas where MySQL programmers need to spend some serious time at the codebase. Re-indexing is one of them, although I suspect a lot of the issues are to do with the way it handles table alterations, which are likely due to the engine system. (ALTER TABLE is not transactionable in any engine, and blocks all other activity until it completes.)

Wade.


Is it enough to love
Is it enough to breathe
Somebody rip my heart out
And leave me here to bleed
 
Is it enough to die
Somebody save my life
I'd rather be Anything but Ordinary
Please



-- "Anything but Ordinary" by Avril Lavigne.

· my ·
· [link|http://staticsan.livejournal.com/|blog] ·
· [link|http://yceran.org/|website] ·

New "It's a bit of extra work" - not that much
My style of usage is monthly builds with extensive queries in between.

A simply script can handle it all.

New Ah. I see. Wouldn't suit my use. :-)


Is it enough to love
Is it enough to breathe
Somebody rip my heart out
And leave me here to bleed
 
Is it enough to die
Somebody save my life
I'd rather be Anything but Ordinary
Please



-- "Anything but Ordinary" by Avril Lavigne.

· my ·
· [link|http://staticsan.livejournal.com/|blog] ·
· [link|http://yceran.org/|website] ·

New You play with partitioning yet?
I want to partition on state.

I thought I'd use lists, but if there is a miss on an invalid state, it throws the record away.

I was trying to do it on the hash(ascii(state)) to at list break it down by 1st char, but it refused to let me use that.

Any ideas?
New Nevermind - needed key()
New Unusable
Load spread drops to about 1/10th of a regular table.

I'll put this one aside for a while.
New Rare updates, mostly reads?
What about Berkeley?

Alternatively just do a bake-off.
Regards,

-scott anderson

"Welcome to Rivendell, Mr. Anderson..."
New Still need SQL access
But yeah, I'll be loading and baking for the next couple of weeks.
New there is a sql like addon for berkley I dont care for it tho
Quantum materiae materietur marmota monax si marmota monax materiam possit materiari?
Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free american and do not reflect the opinions of any person or company that I have had professional relations with in the past 51 years. meep

reach me at [link|mailto:bill.oxley@cox.net|mailto:bill.oxley@cox.net]
New "Guide to Oracle Berkeley DB for SQL Developers":
By Margo Seltzer, [link|http://www.oracle.com/technology/pub/articles/seltzer-berkeleydb-sql.html|Guide to Oracle Berkeley DB for SQL Developers] at www.oracle.com.

(Heh; I'd forgot w^Hthey bought that outfit, wossname [something like "SmileyCat"?], too.)


   [link|mailto:MyUserId@MyISP.CountryCode|Christian R. Conrad]
(I live in Finland, and my e-mail in-box is at the Saunalahti company.)
Ah, the Germans: Masters of Convoluted Simplification. — [link|http://www.thetruthaboutcars.com/?p=1603|Jehovah]
New I'm sorry, you must have thought I wote (new thread)
Created as new thread #294320 titled [link|/forums/render/content/show?contentid=294320|I'm sorry, you must have thought I wote]
New Here we go again
MySQL could not handle simple correlated sub-queries.
Let me rephrase that.
MySQL SUCKS as handling simple correlated sub-queries. Turned what should be a 3 second query into 3 minutes.

So I tried postgresql.
And found it was generally faster in all my particular query cases, and handled the subqueries without problem.

One difference in MySQL's favor was size of data loaded. MySQL is much more efficient, taking far less space on disk. But I don't care about that.

But, on yet another hand, postgresql is STILL too slow.

I can thankfully state that M$ is simply not an option, but anything that runs on Linux (and I can afford, the cheaper the better) can be. Including (drum roll please) SAP-DB, Informix (it went open source, right? Hmm, gotta start googling) and pretty much anything else I can come up with a Perl interface to talk to.

Oh, how I miss Oracle. But even if I spend a couple of months working around not having Oracle, it'll still be FAR cheaper than paying for Oracle.
New Did you use what version of MySQL.
I told you to use 5.1.

There are specific enhancements in MySQL in 5.1

And exactly what kinds of setting were you using?

If you have lotsa memory... let it use it.

If you have drives that blaze... tweak it stop batching/buffering, if you have both... tell it to use more.

And if you didn;t try 5.1, you missed much.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
PGP key: 1024D/B524687C 2003-08-05
Fingerprint: E1D3 E3D7 5850 957E FED0  2B3A ED66 6971 B524 687C
Alternate Fingerprint: 09F9 1102 9D74  E35B D841 56C5 6356 88C0
Alternate Fingerprint: 455F E104 22CA  29C4 933F 9505 2B79 2AB2
New No you didn't
You said:

Try the pre v6 stuff first, its pure gold.

and

use MySQL v.latest.(stable or devel) or use pre v6 stuff

I did V5.0 1st.
When I tried 5.1, I got into some lib dependancy issues. It's certainly not in stable, which is a bit scary.

If you are telling me there is significant speeds on query, then I'll try again. But if you are referring to all the new whizbang features (and yes, there are LOTS) that I don't care about, then I'll pass.
Expand Edited by crazy Oct. 18, 2007, 11:29:40 PM EDT
New This is the 5.1 list of enchancements
[link|http://dev.mysql.com/doc/refman/5.1/en/mysql-nutshell.html|http://dev.mysql.com...sql-nutshell.html]

The only thing referenced is the partitioning and that doesn't help the initial poor SQL optimization for the straight selects of smallish tables.

I think I'm expecting too much since I have my Oracle experience.
New MySQL has lots of ah incompleteness in it's optimizer.
It's mostly recent stuff, so they clearly have more work to do there.

Sub-selects is one area like that. I remember my web developers practically drooling over sub-selects when we finally upgraded the production databases. Turns out they are way too slow for many things. If you can restructure it as an Inner Join (even with a temporary table), it will be a lot faster. Like hundreds of times faster.

Wade.


Is it enough to love
Is it enough to breathe
Somebody rip my heart out
And leave me here to bleed
 
Is it enough to die
Somebody save my life
I'd rather be Anything but Ordinary
Please



-- "Anything but Ordinary" by Avril Lavigne.

· my ·
· [link|http://staticsan.livejournal.com/|blog] ·
· [link|http://yceran.org/|website] ·

New I have a tough time restructuring
We are creating a generic table driven GUI (Web 2.0 Goodness) front end. The tables can vary tremendously on the backend. So multi stage selects with temp table are right out, since we would have define some type of templating language to drive the query creation, which I want to avoid.
New Fair enough.


Is it enough to love
Is it enough to breathe
Somebody rip my heart out
And leave me here to bleed
 
Is it enough to die
Somebody save my life
I'd rather be Anything but Ordinary
Please



-- "Anything but Ordinary" by Avril Lavigne.

· my ·
· [link|http://staticsan.livejournal.com/|blog] ·
· [link|http://yceran.org/|website] ·

New Sigh... Didn't look too deeply into Firebird, didja?
[link|http://www.janus-software.com/fb_fyracle.html|Fyracle: Oracle-mode Firebird].

There, that's the last I'll care about your problems, you fucking ingrate.


   [link|mailto:MyUserId@MyISP.CountryCode|Christian R. Conrad]
(I live in Finland, and my e-mail in-box is at the Saunalahti company.)
Ah, the Germans: Masters of Convoluted Simplification. — [link|http://www.thetruthaboutcars.com/?p=1603|Jehovah]
New Sigh - This time it's an option
So yes, I'll be happy to.
Thanks for mentioning it.
New Ehh, you never CARED!
If you cared, you'd tell me what the bulk loading tool for Firebird is. I don't see one in the docs or the utilities that I have.
     Hey Greg! Database choice time - (crazy) - (40)
         PICK? /me dux -NT - (boxley) - (1)
             You can't duck low enough. </CLUB> -NT - (crazy)
         Firebird? - (CRConrad)
         Well, I've got a new religion lately... - (folkert) - (21)
             Hmm, are they repeating Oracle's error: TOO MUCH "tweakage"? -NT - (CRConrad)
             :-O - (static)
             Oh and here is the MySQL Proxy page - (folkert)
             Ok, playing with 5.0 - (crazy) - (14)
                 Some things are still slow in MySQL. - (static) - (13)
                     Oh yeah - I forgot about that - (crazy) - (12)
                         I saw an Oracle guy do a Bit Of Learning... - (static) - (11)
                             You teased me - (crazy)
                             On the other hand, maybe those MySQL people - (crazy) - (9)
                                 OT have you looked at hadoop? - (boxley) - (5)
                                     Nope - (crazy)
                                     /me rubs eyes - (imqwerky) - (3)
                                         You might have read it, but it's not what it said. -NT - (CRConrad) - (2)
                                             I'm aware of that now - (imqwerky) - (1)
                                                 Ah. Forgot to get your head out, hadya? - (CRConrad)
                                 Interesting. - (static) - (2)
                                     "It's a bit of extra work" - not that much - (crazy) - (1)
                                         Ah. I see. Wouldn't suit my use. :-) -NT - (static)
             You play with partitioning yet? - (crazy) - (2)
                 Nevermind - needed key() -NT - (crazy) - (1)
                     Unusable - (crazy)
         Rare updates, mostly reads? - (admin) - (4)
             Still need SQL access - (crazy) - (3)
                 there is a sql like addon for berkley I dont care for it tho -NT - (boxley)
                 "Guide to Oracle Berkeley DB for SQL Developers": - (CRConrad) - (1)
                     I'm sorry, you must have thought I wote (new thread) - (crazy)
         Here we go again - (crazy) - (9)
             Did you use what version of MySQL. - (folkert) - (2)
                 No you didn't - (crazy) - (1)
                     This is the 5.1 list of enchancements - (crazy)
             MySQL has lots of ah incompleteness in it's optimizer. - (static) - (2)
                 I have a tough time restructuring - (crazy) - (1)
                     Fair enough. -NT - (static)
             Sigh... Didn't look too deeply into Firebird, didja? - (CRConrad) - (2)
                 Sigh - This time it's an option - (crazy)
                 Ehh, you never CARED! - (crazy)

China?
142 ms