IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 1 LpH | Statistics
Login | Create New User

Welcome to IWETHEY!

New Scalability layer ...
I think I can guess what that would mean, but it sounds like it's fixing the wrong problem. But rather than discuss my assumption ... what's it actually do?

Kip Hawley is still an idiot.


Purveyor of Doc Hope's [link|http://DocHope.com|fresh-baked dog biscuits and pet treats].
New It's farily straightforward.
The basic assumption you start with when you write a website in PHP that talks to MySQL is that there's one webserver and one database server. Unfortunately, that doesn't scale (ask LiveJournal or Flickr or Slashdot...). The very common teaching of doing SQL very close to the presentation logic re-inforces this.

Leaping over a whole mess of how we get there from there, moving to multiple webservers and multiple redundant database storage servers requires coding the website a very different way. If you don't (or can't!) teach the PHP code how to fetch the data from the right databases or how to behave correctly when multiple webservers can serve the user over the course of a few minutes, you need to abstract that away. Thus, a scalability layer.

The way mine has developed is that it presents objects to the application code. There is a core object that looks after all the getting and setting and can cater for many things the application developer probably hasn't thought of, like some light caching. The application doesn't have to care how the objects are stored or where they're stored: just that they are. They might not even be in a real database. Adding new objects is little more than setting up a few parameters and subclassing in a particular way.

Trouble is, it means that the power of SQL is muted. Most access is done via very simple SQL statements that are programmatically generated. Complex JOINs are fewer and put in special corners of objects. Ditto for stored procedures. Clustering and data partitioning tends to be organised above the SQL and outside databases which can already do some of this themselves. But working above the SQL suggests possibilities for interesting flexibilities.

I feel like I'm attempting to get through some jungle down a hard-to-find track and I have very little idea who is ahead of me. Yet what I'm trying seems to make sense and seems to be working.

"Don't give up!"
New What's that buzzing sound?
Ah, it's the buzzwords. What you called a "scalability layer" sounds like what I'd call a DB abstraction layer. Are you using MySQL clustering? If so, you can handle 90% of what you're trying to do with a few lines in your DB connection class and proper configuration of a load balancer in front of the web servers.

I used the PEAR DB class with a local wrapper that pointed to a config file and wrapped a couple of functions. The config file had the DB names, usernames and passwords for each environment, so I could move code from dev -> test -> QA -> prod without having to change anything. The wrapped functions examined the query being executed and, if it was a select, it passed the query to the replicated slaves. Only update queries were passed to the master.

This requires only minor changes in coding style:
  • If you want to chain multiple queries together and issue them with one command, you have to set an optional parameter to specify whether to use the master or a slave.

  • Since you can't rely on staying on the same webserver, if you want to use PHP session handling you have to implement it yourself in the DB.

  • When updating under heavy load, you can be directed to a page that will query a slave before the update has propogated. This happened rarely enough in my circumstances that we addressed it on a page-by-page bases. There were a few places where it was more common.

  • If you don't want sessions to potentially be tracked across multiple logfiles, you have to have all webervers point to a common log server instead of using local logging.

Personally I found the multiple DBs easier to deal with than the problem of sessions jumping from one webserver to another. I still believe we just never got the load balancer configured properly. It seems like you should be able to specify that any connections from the same address within $x minutes should be routed to the same server as the last time.

Kip Hawley is still an idiot.


Purveyor of Doc Hope's [link|http://DocHope.com|fresh-baked dog biscuits and pet treats].
New Not so fast, bubba boy.
There's a database handler, true, and it does all the switching between servers that are in a replicated arrangement. It's not quite as full-featured as any of the PEAR classes, but then, it's not tied to their method of doing things either (there *are* disadvantages of copying each others API calls...).

It is on top of the DB abstraction layer that the objects sit. They are the ones that hide the data fetches and saves. Both of them have been developed to have a low memory footprint - there is a very important call that uses an unbuffered query for that very reason, as well as lots of PHP references. Sometimes I hear the strain on the langauge...

We look at MySQL clustering from time to time but we can't afford machines that have 80Gb of RAM. If they exist. And we're looking at how to do a dispersed database, too, where the data migrates between application instances according to certain rules (like where the end-user is). MySQL doesn't know how to do that.

[link|http://www.danga.com/words/2005_oscon/oscon-2005.sxi|http://www.danga.com...on/oscon-2005.sxi] is a presentation created by Brad Fitzpatrick of LiveJournal. He describes LJ's journey from a single server to a large number of servers all doing bits-n-pieces with no single point of failuire.

Our system is currently like his slide 19. *I'm* the one thinking furthest ahead - further than my boss, who is currently taking his first steps beyond the 'buy a bigger server for the database' mode. And none of my web developers are that far up. They are still thinking in terms of one web-server talking to one-database. The load balancer and the database handler and a few other things make magic so that they can keep programming like that, but they're resisting the learning. So my approach is a little different to LiveJournals': I'm beginning to abstract away the scalability tricks. Thus a 'scalability layer'. (The term also helps put in people's mind the idea that it's more than a simple database API. :-)

"Don't give up!"
New A question you might ask yourself
and I'm asking more and more is "do you really need a database"? Could you do as well with a well thought out directory structure and files? More and more I'm finding the answer is "probably".

Of course, I've become a bit more adventurous since working at big river. There are precious few conventional databases to be found there, but lots of clever data stores.

[link|http://www.blackbagops.net|Black Bag Operations Log]

[link|http://www.objectiveclips.com|Artificial Intelligence]

New It is a promising thought, I'll admit.
I've suggested something like that to my IT colleagues, but I haven't so much as hinted it at the web developers yet; they'll challenge it unless it can be given to them working and better than what is in place now. And I really need them using the API before I can do anything like that. At the moment, there is a still a lot of SQL done in the website...

I do empathise. I see signs of 'it's a database - that must be the way to do it' :-/

"Don't give up!"
New Just gone through that
I've automated some inbound email->order entry into one of our systems.
I'm not supposed to create an order for certain type of products when the inventory is low, I'm supposed to hold the order for the next day's processing and try again.

So of course, hold order == database item. Or so I thought.

Than I realized the inbound email had to be stored in it's entirety, which really means an index value of inbound sequence number, and then a blob. I needed to parse it new each time since the parsing took into account external translation tables, which could change each run, which means there is no win to storing the post-parsed message.

Things are never on hold for very long, and the qty of on-held orders is low, max into the hundreds.

So, I simply created a hold file for each message and feed them back into my process every day. The file name is the inbound timestamp which insures FIFO processing.
New Are we going back to frame-style usage?
What you're talking about sounds like the way I understand mainframe "databases" work. I may be making connections that aren't there, but it sounds like there was a style that worked well on the frame. Then we developed relational databases. They were a convenient abstraction for the programmers. They worked well for smaller problems and smaller datasets, and ran on midranges and workstations.

Then as the available horsepower grew some of the old frame jobs were re-written to use relational DBs on fast workstations. (Yes, Barry, this means you.) Now it sounds like the problems we're trying to solve -- the size of data and the number of transactions -- are exceeding what we can do well with relational DBs. So now people are starting to re-invent the style of creating their own domain-specific database.

Is this completely wrong?

Kip Hawley is still an idiot.


Purveyor of Doc Hope's [link|http://DocHope.com|fresh-baked dog biscuits and pet treats].
New Interesting take on things.
Has some merit, I'll admit. I thought I liked the idea of having one MySQL database tuned for some of my data and another tuned differently for another lot of data. You're suggesting that flatfiles might be better for some other types of data and perhaps some other, highly custom arrangement better for another sub-domain.

But then, you couldn't do the sort of adhoc stuff on mainframe datasets that we do now on many relational DBs. You had to program something and wait for it to finish. Which might be several minutes, it might be several hours, it might be several days.

I think the role of 'database programmer' is being re-invented.

"Don't give up!"
New Best tool for the job and all that
People treat SQL databases as golden bullets for their data storage needs, rarely taking into account the downsides. Time to take a step back and figure out what you really need to accomplish, and THEN determine the best method of storage and access.
New Yah
Too many people just think "I need to store a lot of data on disk - I guess I need a database". They don't realize that databases COST you something too.

If its just about persistence - probably you can do better with plain old files. OTOH, if you really do have a lot of different ways of looking at your data, then a database is just the thing.

I'm actually moving an app from an oodb, which is mostly only good at persistence, to postgres because I have ever expanding reporting requirements.

[link|http://www.blackbagops.net|Black Bag Operations Log]

[link|http://www.objectiveclips.com|Artificial Intelligence]

New Power of abstractions vs. absolute performance
OODBMSs are supposed to offer better abstractions to make programming easier, but I haven't heard anyone say the performance is acceptable. Eventually the hardware will pick up enough that today's experiments will become practical. But by then we'll be trying to do more, with more data.

Maybe we should keep the greybeards around so that when we hit the wall of what we can do with current hardware, they can remind us how they used to do things.

Kip Hawley is still an idiot.


Purveyor of Doc Hope's [link|http://DocHope.com|fresh-baked dog biscuits and pet treats].
     Okay, I'll relent. (edited/updated) - (folkert) - (23)
         Take a compiler course - (jake123)
         My favorite books - (ChrisR) - (18)
             New compiler books needed? - (tonytib) - (17)
                 A new dragon book was put out this year - (ChrisR)
                 I'd like one about building scalability layers. - (static) - (14)
                     Scalability layer ... - (drewk) - (11)
                         It's farily straightforward. - (static) - (10)
                             What's that buzzing sound? - (drewk) - (9)
                                 Not so fast, bubba boy. - (static) - (8)
                                     A question you might ask yourself - (tuberculosis) - (7)
                                         It is a promising thought, I'll admit. - (static)
                                         Just gone through that - (crazy)
                                         Are we going back to frame-style usage? - (drewk) - (4)
                                             Interesting take on things. - (static)
                                             Best tool for the job and all that - (crazy) - (2)
                                                 Yah - (tuberculosis) - (1)
                                                     Power of abstractions vs. absolute performance - (drewk)
                     Taking SQL away - (tuberculosis) - (1)
                         Interesting. - (static)
                 Related but a bit OT question - (tjsinclair)
         What about embedded development? - (tonytib) - (2)
             Would any of these - (tuberculosis) - (1)
                 Possibly, but would require more hardware - (tonytib)

They are the Eggmen.
122 ms