Programming and design experiment

Post #274,134 by admin 11/27/06 10:12:32 PM Reply	Programming and design experiment Here's a theoretical situation to discuss: You have a web site that supports about 50,000 users, of which perhaps 5000 are logged in and doing things at any particular time. The application consists of a set of data approximately 10M in size (100,000 100B elements), with linked data subsets each approximately 10K in size for 1/10th of the elements (another 100M). Assume that user data is another 50M. Users will be making changes to essentially random elements in a shared manner. Many of the changes will kick off timed events that can occur up to 24 hours later. All of the data subsets also have continuous modifications (say, delta per hour) that only need to be calculated when viewed by a user or accessed by a timed event. How would you design this? Assume the front-end is delivered by a scripting language + framework (PHP, Perl, Python). Assume standard PC server hardware. Now let's say that your application can have half a dozen independent instances of the same general 110M data structure, and additionally can have an undetermined number of smaller instances with correspondingly smaller instances. Users will be able to open sub-accounts into any of the instances. What changes? Yes, this is for a project I've been kicking around. I've got my own ideas (based in part on some conversations I've had with Todd), but I was curious to see what other people could come up with as well. Regards, -scott anderson "Welcome to Rivendell, Mr. Anderson..."
Post #274,137 by admin 11/27/06 10:18:35 PM Reply	Yes, or Icon, Wade... ;-) Regards, -scott anderson "Welcome to Rivendell, Mr. Anderson..."
Post #274,140 by static 11/27/06 10:34:44 PM Reply	I do most of my programming in PHP these days. But that's because I get paid for it. :-) Wade. "Don't give up!"
Post #274,139 by static 11/27/06 10:33:27 PM Reply	This sounds familiar. I look after a scenario of similar design, but the key numbers (simultaneous users, total data size) wildly different. My first thought would a half-dozen load-balanced webservers to get the simultaneous access up at the front. I'd also implement some solid structures in the framework so that anything messy in having multiple web servers is abstracted away and solved once. Also, this would be a good place to do some intelligent short-term caching, including possible things with memcached. Next I'd be asking more detail about what the data access pattern is like. Significantly, what is the read/write ratio like? And how is the dataspace organised? If the read/write ratio is pretty good, then you could get away with a master database on a big box with several slaves on other equally big boxes. DB abstraction layers can be taught to goto a random slave for reading and still stuff writes to the master. Transactions, if you need them, will have to be thought through. If the read/write is more like 1:1, then I would look at what LiveJournal have done: partition their data across multiple databases. That way some users hit one database most of the time and others hit another most of the time. I think this approach could be extended to support a distributed application like you suggest. I think you'd end up with a lot of duplicated data between all instances, though. Creating new entries will be tricky; one trick that I used successfully in a previous job is to devote a part of any uniqueID (like the high half) to a 'instance' number. That ensured that there would be no conflicts when the data was copied or moved. It means you might have to abandon any sequencing the database will do for you, though, but transactions and stored procedures will help. Wade. "Don't give up!"
Post #274,143 by admin 11/27/06 10:56:02 PM Reply	Re: This sounds familiar. My first thought would a half-dozen load-balanced webservers to get the simultaneous access up at the front. I'd also implement some solid structures in the framework so that anything messy in having multiple web servers is abstracted away and solved once. Also, this would be a good place to do some intelligent short-term caching, including possible things with memcached. The problem with shared-instance caching is that the data can be very volatile. Next I'd be asking more detail about what the data access pattern is like. Significantly, what is the read/write ratio like? And how is the dataspace organised? If the read/write ratio is pretty good, then you could get away with a master database on a big box with several slaves on other equally big boxes. DB abstraction layers can be taught to goto a random slave for reading and still stuff writes to the master. Transactions, if you need them, will have to be though through. Assume equal reads and writes. If the read/write is more like 1:1, then I would look at what LiveJournal have done: partition their data across multiple databases. That way some users hit one database most of the time and others hit another most of the time. Instances can't be partitioned, since they are shared by all users. Instances could be wholly on separate partitions, however. I think this approach could be extended to support a distributed application like you suggest. I think you'd end up with a lot of duplicated data between all instances, though. Creating new entries will be tricky; one trick that I used successfully in a previous job is to devote a part of any uniqueID (like the high half) to a 'instance' number. That ensured that there would be no conflicts when the data was copied or moved. It means you might have to abandon any sequencing the database will do for you, though, but transactions and stored procedures will help. Again, the trick with duplicated data will be invalidating cached data or propagating changes. Thanks for the input. I'm actually more interested in sparking discussion, and I thought this was a good example to use since it's different than things like forum software and the like. Regards, -scott anderson "Welcome to Rivendell, Mr. Anderson..."
Post #274,155 by static 11/27/06 11:56:01 PM Reply	We're back to a scalability layer. The problem with shared-instance caching is that the data can be very volatile. This is what usually happens when I say 'caching' in this context. :-) The sort of caching I usually have in mind in this context is the kind that, assuming a typical PHP page, stops the page hitting the database multiple times for the same piece of data. Lifetime is typically seconds and validity is local to the page. Shared-instance caching of data is a different game. You don't cache volatile data with that; you cache essentially static data. The active user list is a good example, provided users don't get added or delete all that often. There is a range of data in most apps like that that is a few to a few dozen entries and doesn't change in days. Even if it's not much, I imagine putting it in memcached is cheaper than constantly fishing it off a database. On the other hand, putting PHP session information into memcached apparantly works very well. I'm just about to try that. Assume equal reads and writes. Ouch. That makes replication much less attractive. Instances can't be partitioned, since they are shared by all users. Instances could be wholly on separate partitions, however. I think I need to be careful with my terminology. LJ horizontally partitioned their user database because they never joined one user's data with another user's data. Then they have a cluster database which is used to indicate which cluster a particular user's data is. Is your data structure like that? The support system I work with could probably do that because individual tickets stand alone. Thanks for the input. I'm actually more interested in sparking discussion, and I thought this was a good example to use since it's different than things like forum software and the like. Indeed. And I really welcome the opportunity. I personally have been wrestling with the concepts of a scalability/database-abstraction layer and what that means in practice. I posted about it elsewhere, but my main concern with the direction I'm heading is that it seems to be taking SQL away from the application layer which my developers aren't showing any signs that they want to understand. Yet I can see benefits from this because I could put different objects in different databases or even not in a database and the application code doesn't need to know or care. Wade. Postscript: I have an opportunity to help optimize a quite different application that is thrashing its database. The owner of the instance has a replication setup but no knowledge how to make the app talk to the slave for reads :-/. Unfortunately, the database handler is the ADODB one from PEAR. In other words, the whole thing assumes there is one database and only one database. Knobs. This is (one reason) why I don't like the PHP database library addons. And modifying it will be a non-trivial task. sigh Unfortunately, the one I've written here that does know how to use multiple databases intelligently is technically owned by my employer... "Don't give up!"
Post #274,158 by drewk 11/28/06 12:06:04 AM Reply	You can do that with PEAR And still keep it away from the programmers. Create a class that extends DB.php. Wrap the connection method with your own, which checks if you're reading or updating. If you're reading, do it from one of the slaves. Best way is to put a load balancer in front of the slaves so you can dynamically add/remove slaves without interrupting the app. Next best is round-robin or random select within your class. Wait, you said ADODB. I used DB, not DB_ado, but I assume the above still applies. === Kip Hawley is still an idiot. === Purveyor of Doc Hope's [link\|http://DocHope.com\|fresh-baked dog biscuits and pet treats]. [link\|http://DocHope.com\|http://DocHope.com]
Post #274,159 by static 11/28/06 12:11:50 AM Reply	ADODB has a reputation. Mainly that it's a heavy layer for what it does. You're right: derive a new class and use that instead makes sense. That's probably what I'll do for that other project. Wade. "Don't give up!"
Post #274,213 by admin 11/28/06 12:40:02 PM Reply	Re: We're back to a scalability layer. Is your data structure like that? The support system I work with could probably do that because individual tickets stand alone. There may be some opportunities, but they are probably limited. Regards, -scott anderson "Welcome to Rivendell, Mr. Anderson..."
Post #274,141 by admin 11/27/06 10:39:30 PM Reply	A few ideas For the first one, I'm thinking of just reading everything into memory and lazily writing changes. The lazy writer would probably be a second process to allow easy partitioning. The second bit adds the wrinkles. A few things I've kicked around are: 1) separate processes for each instance, similar to the architecture above. The small, multiple instances throw a wrench into that setup, though. Perhaps multiple processes, each handling some subset of the instances, determined somewhat dynamically by load. There could be a lot of the smaller instances, and some of them could be as large as one of the main instances. 2) separate processes on the web tier servicing all instances, calling into a central shared cache by instance. I run into the problem of locking changes on the cache, however. 3) punt and just chuck it into a database. Worry about scaling later. ;-) Another worry is the shared user information. Regards, -scott anderson "Welcome to Rivendell, Mr. Anderson..."
Post #274,152 by drewk 11/27/06 11:36:57 PM 11/27/06 11:39:34 PM Reply	How critical is update ordering? On a web app, user A gets a page, then user B gets a page, then user B submits a change, then user A submits a change. Unless you're doing some craziness that'll really kill performance, last in wins. If that's acceptable, is it because it's unlikely two users would be changing the same data concurrently? If so, you could fake multi-master replication in the DB layer. Have all updates handled through stored procedures, and have every update proc fire the update to multiple other DBs after it commits. Do your framework properly and programmers don't need to think about it. There are probably numerous reasons this won't work, but it's all I can come up with that I haven't already seen done. [edit] Bah, just saw that consistency is key. Can't rely on "usually not a problem" in those cases. But would it be true often enough, and would the performance boost be great enough, to make it worth trying? === Kip Hawley is still an idiot. === Purveyor of Doc Hope's [link\|http://DocHope.com\|fresh-baked dog biscuits and pet treats]. [link\|http://DocHope.com\|http://DocHope.com] Edited by drewk Nov. 27, 2006, 11:39:34 PM EST Expand All History
Post #274,154 by admin 11/27/06 11:39:52 PM Reply	Critical. If B causes some change that invalidates A's change, we need to know. Regards, -scott anderson "Welcome to Rivendell, Mr. Anderson..."
Post #274,156 by static 11/28/06 12:02:19 AM Reply	That's could be really hard. It involves setting up a locking mechanism from (nearly) top to bottom in the webapp. I looked at doing that months ago for my support app, however, mental exercises quickly showed how difficult that would be before I even got to write any code. The problem, of course, is the statelessness of web pages. However, maybe you could do something clever with journalling. Thinking out loud here, each edit page would be given a timestamp of the data state. A subsequent post, probably 90 seconds alter, would check for any logged changes between that timestamp and now. If there are any, it would either see if it's changing what's already been changed (how good is your logging? :-) or throw back a message to the display layer with a the new state. Wade. "Don't give up!"
Post #274,161 by admin 11/28/06 12:22:31 AM Reply	One reason why I lean towards all-in-memory with lazy-writing of changes. All a particular operation need do is check the in-memory instance to make sure the operation is (still) legal. If someone tries to modify an element that has changed to disallow that modification since the first user retrieved it on their page, sending back some sort of "you can't do that and here's why" message is sufficient. Regards, -scott anderson "Welcome to Rivendell, Mr. Anderson..."
Post #274,186 by ChrisR 11/28/06 10:30:51 AM 11/28/06 10:31:08 AM Reply	Doable from a database as well... ...just stamp read time when the page is retrieved. Then check that against a last update time when the page is submitted back. Edited by ChrisR Nov. 28, 2006, 10:31:08 AM EST Expand All History
Post #274,189 by admin 11/28/06 10:58:05 AM Reply	I didn't say it wasn't doable. The cost of that roundtrip to the database can be prohibitive under high load. Regards, -scott anderson "Welcome to Rivendell, Mr. Anderson..."
Post #274,149 by tuberculosis 11/27/06 11:21:47 PM Reply	Consistency, Availability, Reliability Pick any 2. We can go from there. [link\|http://www.blackbagops.net\|Black Bag Operations Log] [link\|http://www.objectiveclips.com\|Artificial Intelligence] [link\|http://www.badpage.info/seaside/html\|Scrutinizer]
Post #274,151 by admin 11/27/06 11:34:27 PM Reply	Define your terms. Or define reliability at least. I'd be inclined to favor the first two. Consistency is ultra-important. Server lag is a reasonably close second. How important reliability is depends on what you mean to give up; that kind of choice is usually more of a continuum than boolean. Regards, -scott anderson "Welcome to Rivendell, Mr. Anderson..."
Post #274,163 by tuberculosis 11/28/06 12:29:13 AM Reply	mmm kay They are all a continuum, but generally you can only get two at a time. Consistency means that whatever data you have is never stale. It is never wrong. You relax this by saying it is never more than one version out of date or it is never more than 2 minutes old. Availability means you always have data at your fingertips. You never have to wait long for it. Fetches return immediately with what you expect them to. The data may or may not be current. You relax this by allowing things like "Data is locked by other user - please come back tomorrow" states. Reliability or Scalability implies that the system is always up - it can scale with just the addition of hardware and is capable of withstanding component/peer failure. Relaxing this means using something like a central lock server to manage data consistency introduces a single point of failure that will take down your system. Doing this will cause to to relax availability. Here's a great talk about this stuff by Werner Vogels. [link\|http://www.itconversations.com/shows/detail459.html\|http://www.itconvers...ws/detail459.html] [link\|http://www.blackbagops.net\|Black Bag Operations Log] [link\|http://www.objectiveclips.com\|Artificial Intelligence] [link\|http://www.badpage.info/seaside/html\|Scrutinizer]
Post #274,208 by admin 11/28/06 12:09:15 PM Reply	That's what I thought, for the most part. So consistency, then availability, then reliability. One process per instance, reading all of the data into RAM, then with lazy writes to persistent storage fits the above, I think. Enough of the scalability can still be there, I think, for purposes of the example problem we're discussing, since each instance gets its own process. Werner's blog won't load right now; I'll check it out again later. Regards, -scott anderson "Welcome to Rivendell, Mr. Anderson..."
Post #274,210 by admin 11/28/06 12:12:41 PM Reply	Or how about we do this: Let's talk about the solutions three ways, one for each of the tripod being dropped. Regards, -scott anderson "Welcome to Rivendell, Mr. Anderson..."
Post #274,169 by tonytib 11/28/06 2:16:07 AM Reply	Have you looked at the O'Reilly database war stories? [link\|http://www.google.com/search?hs=Bow&hl=en&lr=&client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&q=site%3Aoreilly.com+%22database+war+stories%22&btnG=Search\|Google Search for database war stories] where most of the databases aren't. There might be some ideas in there; O'Reilly talks to Flickr, craigslist, Google File System, etc. --Tony
Post #274,209 by admin 11/28/06 12:10:03 PM Reply	Yes, I've seen those before. The problem with war stories is that they are often applicable only to a small range of problems. Regards, -scott anderson "Welcome to Rivendell, Mr. Anderson..."
Post #274,283 by drewk 11/28/06 9:31:52 PM Reply	I'm really starting to think there's something to my idea The one about a return to mainframe-style programming. The problem with war stories is that they are often applicable only to a small range of problems. And frame apps -- and associated data storage architectures -- were typically written to solve a small range of problems. "Small" frequently meaning "one". === Kip Hawley is still an idiot. === Purveyor of Doc Hope's [link\|http://DocHope.com\|fresh-baked dog biscuits and pet treats]. [link\|http://DocHope.com\|http://DocHope.com]
Post #274,296 by static 11/28/06 11:10:41 PM Reply	Isn't that what SQLite is sorta trying to solve? [www.sqlite.org] Wade. "Don't give up!"
Post #274,298 by drewk 11/28/06 11:21:11 PM Reply	It's a solution to a specific problem If small and fast is more important than rich feature support (eg: foreign key constraints, some triggers, some join syntax, etc.), and if you can do all permissions through filesystem configuration and application code, then you can use it. It's great for embedded, standalone and single-user systems. === Kip Hawley is still an idiot. === Purveyor of Doc Hope's [link\|http://DocHope.com\|fresh-baked dog biscuits and pet treats]. [link\|http://DocHope.com\|http://DocHope.com]
Post #274,174 by pwhysall 11/28/06 9:15:25 AM Reply	XBase, of course. Duh. Anything else is OOut of the questiOOn. Peter [link\|http://www.no2id.net/\|Don't Let The Terrorists Win] [link\|http://www.kuro5hin.org\|There is no K5 Cabal] [link\|http://guildenstern.dyndns.org\|Home] Use P2P for legitimate purposes! [link\|http://kevan.org/brain.cgi?pwhysall\|A better terminal emulator] [link\|http://darwinia.co.uk/\|[image\|http://i66.photobucket.com/albums/h262/pwhysall/Misc/saveus.png\|0\|Darwinia\|\|]]
Post #274,252 by tablizer 11/28/06 5:28:39 PM Reply	I heard that ________________ oop.ismad.com

Welcome to IWETHEY!