Step away from the gun...

Post #344,043 by malraux 6/22/11 6:36:50 PM Reply	Step away from the gun... I have some pretty extreme experience with this (email me for details). The upshot is that business logic in stored procedures should only be done when absolutely required. Triggers are horrible as well. You may say to yourself, "but Scott, it's just this simple little thing, and triggers make it so much easier". That way lies madness. Trust me. Also, as soon as you start putting logic in databases your scalability is almost immediately limited to vertical scaling, unless you are very very careful. Most developers are not that careful. Regards, -scott Welcome to Rivendell, Mr. Anderson.
Post #344,050 by drook 6/22/11 10:23:05 PM Reply	Wow, not the response I was expecting This might be an issue of definitions. I like the definitions in this article -- http://database-prog...g-definition.html -- of 1st and 2nd order business logic. It's the stuff he describes as 2nd order that I like to put into the DB. Things like, if there is a procedure that calculates the price for a product based on cost, overhead, contract price, etc., then that proc should be called (by a trigger) when an order is created or updated. It's not "business logic" in the sense of flow control or anything like that. It's more about "rich" data validation and constraints. As for "most developers are not that careful", I haven't noticed that having careless developers write bad code is significantly better than having them write bad stored procedures. -- Drew
Post #344,060 by malraux 6/22/11 11:30:59 PM Reply	Why that should not be in the DB: Before I start, some background: I've written entire rules-based automated travel booking systems in stored procedures. I've worked at places that have over a million lines of PL/SQL code. The very forums we're using incorporate stored procedures to manage the complex insertion logic of a new post using nested set trees. I'm not trying to pull argument from authority, but just understand that I've seen a few patterns and anti-patterns, and I've done things with databases that would give Bryce wet dreams for a month. Also, a caveat: I work on big, complex systems. Some of the things I'm going to mention may not apply to smaller systems. However, every system starts small. It doesn't take a lot of effort to future-proof while it's small and if you do you may reap a lot of benefit later on, or indeed be able to do things that would be nearly impossible from an organizational or resource standpoint once the system gets big. There are probably even more arguments for my position but I'm not spending a lot of time on this, believe it or not: 1) Maintenance 2) Traceability 3) Accessibility 4) Testing 5) Scalability 6) Portability 7) Functionality Maintenance: Presumably you are using a different language, such as PHP, to do what you're calling 3rd and 4th order business logic. As soon as you add a second language you're multiplying your maintenance costs. You need a new test framework, you need to add more functionality to your build system, you need resources that can program in PL/SQL or TransactSQL or whatever, you need more debuggers and tools, and so on. Multiple programming environments increase your technical and cognitive surface. Traceability: Where is the logic executing? Because you've added a whole new environment that can run logic, but not all the logic -- remember that 3rd and 4th order logic is somewhere else -- you don't immediately know where a particular piece of logic may be located. Let's say you have that bit that calculates "the price for a product based on cost, overhead, contract price, etc." in a stored procedure. Now, one of your clients requires that you calculate their price based on the results of a web service call. You had better not be trying to do that from within a stored procedure (I will kill you for every time you mention UTL_HTTP, fair warning), so now you need to move that logic out of the stored procedure and into your app tier. Maybe you have other logic for other clients as well... do you move everyone's out now? Or do you maintain price calculation logic in two different places? The next time you need to make a change, how do you know the price calculation logic is in PHP, and the state tax logic is still in a stored procedure? Or what if the pricing logic needs to look up values, or it now needs to be modifiable by an administrative person or user? If you've got logic in a trigger that is accessing other tables you're asking for trouble. Because triggers are essentially invisible you can end up with chains of them doing multiple things, all within your transaction space. How do you really know what's happening on that insert? Another variation on this point is separation of concerns: when you start using stored procedures and triggers the line between data and logic becomes blurred. Accessibility: Some databases, like Oracle, let you do an awful lot in their stored procedures. Invariably this functionality is slower, clunkier, and almost guaranteed to cause performance issues as the system grows. Or take logging. You're going to want to have logging in any procedural code you write. I've written a syslog client for PL/SQL just to get a logging solution that would perform well enough to not cause the database heartburn under load. Not recommended. Testing: Unit tests need to be fast. Very fast. They should run every time you check code into your repository, and they need to be self-contained. If you have logic in a database system, now you need to have a running database to run your tests, which means that tests for this logic automatically get moved to the component (integration) testing phase of your development pipeline. Errors get caught later as a result. And since your code resides in a trigger, you have to scrub your data before and after testing, all of which complicates your test harness. Scalability: Once you dip into a database's transactional context your costs of execution go up by an order of magnitude at least. If you can do your calculations, constraints, validations, etc. before you have to invoke a remote call with a transaction context and persistence, you're going to scale better. Leave that logic out in the app tier and it can be parallelized, optimized. A database is persistent storage. Treat it like a hard drive, and don't go there unless you absolutely have to. The most performant systems do their work in memory (this can be done in a high availability fashion if required) and write the persistence out lazily. Portability: IWETHEY is a case in point. I'd love to test it on MySQL, but I can't because the post creation logic is implemented in a stored procedure. If it were out in my app tier then switching to a different persistent store would be as simple as changing my JDBC driver and loading the table schema elsewhere. Or let's say that you decide a few years into the life of the application that parts of it would be better off in a columnar-oriented database like Vertica. Or maybe in something completely different like BigTable. If you are only persisting data with your persistent data store, you can easily swap that piece out. Functionality: Rules engines. Put your validation, calculation, and constraint logic in a rules engine and you can improve your functionality (and maintenance). Your admins can change the rules without needing a programmer, either directly via the rules themselves or with an administrative GUI. Or not... either way the functionality available to a general purpose language, running outside of a database's execution context, is much greater than the limitations imposed by a special purpose stored procedure language. Regards, -scott Welcome to Rivendell, Mr. Anderson.
Post #344,075 by Another Scott 6/23/11 6:47:10 AM Reply	Very interesting. Thanks for the detalied explanation.
Post #344,062 by static 6/22/11 11:34:05 PM Reply	I wasn't expecting that trend, either. I know when I was trying to do scale in PHP on MySQL, it was so much easier to do it all myself in PHP. Numerous people kept finding "interesting" technologies that "should have" worked, but the solutions were always at or below the SQL. This never made sense to me: the knowledge required for sharding and partitioning vanished as soon as you hit the SQL. Similarly, I often found that handling referential integrity for a one-app database was handled better in the app, not the DB. Why? Again, because the app knows why the reference failed, the DB only knows that it failed. And handling recovery still requires the app to know what's going on. Again, for a PHP app, it was so much easier to do it in the app and not tell the DB. Now that we're doing Java against the DB, I've been co-erced into doing referential integrity in the DB. The main reason is that Hibernate and JDBC seem to expect it. :-/ Wade. Static Scribblings http://staticsan.blogspot.com/
Post #344,064 by malraux 6/22/11 11:41:29 PM Reply	Re: I wasn't expecting that trend, either. That second point about the app knowing why and the DB only knowing that it happened is a great one. Referential integrity in the database is a good idea even if you have it in the app, from both a belt and suspenders point of view as well as supporting automated tools. If you've ever tried to reverse engineer a database with hundreds of tables and no RI you'll know what I mean. Referential integrity is important meta data as well as a means of data validation. I don't work on one-app databases that often so my experience and preferences are heavily slanted towards risk management and maintenance in large, complex environments. Regards, -scott Welcome to Rivendell, Mr. Anderson.
Post #344,068 by drook 6/23/11 5:17:43 AM Reply	I think I know what the problem is I'm looking at a ton of shit code, and it's duplicating functionality all over the place that does the same thing multiple shitty ways. I know that it can be done via stored procedures, which would keep developers from having to touch it. "But who would write the procs?" you say. Yeah ... good question. How about this, then, since it's my other big gripe. We've got two different apps that have home-grown grid controls. Both are based on pulling back large datasets, everything available to a user in a specific context. Then the app code handles paging, sorting and filtering. I've worked on systems with more data and more users and we had consistently faster page loads than I see on this system. We did all the paging/sorting/filtering in SQL. Is this a case where performance sucks because we shouldn't be trying to do that, or just another case that the current implementation sucks? -- Drew
Post #344,074 by static 6/23/11 5:51:15 AM Reply	Paging is tricky. I've found you need to do two things: push to the database what is easier to do in the database (filtering and sorting, usually), and don't let the DB or ORM layers buffer if you don't need it. My expertise here is web servers, specifically PHP on MySQL. MySQL does filtering, sorting and paging faster and more efficiently than in the PHP, so I expose these controls through the object interface. They get converted into the right SQL clauses as required. I also use an unbuffered query whenever I can get away with it and make it easy for the app to pull one row at a time (a generator, basically). I've seen this reduce memory usage by orders of magnitude and speed to increase substantially. I think this approach is kind of halfway between your two extremes. Wade. Static Scribblings http://staticsan.blogspot.com/
Post #344,089 by malraux 6/23/11 11:54:11 AM Reply	Re: I think I know what the problem is Paging, sorting, and filtering are what relational databases are there for in general, beyond simple persistent storage. Because you're using a portable API (SQL), the actual code controlling what the sort is, what the filter is, etc. is kept out in the app tier so most if not all of my issues with business logic don't apply. There's an issue with scaling if you're not using read-only shards. Depending on the use cases you'll also want to investigate caching. Also depending on the use cases (do they involve extremely complex filtering or reduction logic) you might be better off moving out of the database. The key is to keep it modularized to watch your separation of concerns. Regards, -scott Welcome to Rivendell, Mr. Anderson.
Post #344,090 by drook 6/23/11 12:40:59 PM Reply	Grid views, basically Customer (or vendor) gets a view of all current orders. They want sorting and filtering by clicking on column headings. It's trivially easy to create a control in Visual Studio that does a single query, then keeps everything in memory doing sorts, filters and paging on that data set. Whether that's a good thing to do is the question. We just had a meeting on this and two of us insisted that we should be hitting the DB for every sort and filter operation. I can live with keeping it in memory for paging, because it avoids the pathological case: User is on page 2, a record is at the top of page 3. Before they click "next" someone clears an item on page 1 or 2. Now I'm on page 3 and I never saw that record, which is now on page 2. Maybe that's not worth having a different mechanism for paging than for sorting/filtering. Maybe I'm just trying to find a middle ground between what we have and what I think we need, so there's less resistance to doing it. -- Drew
Post #344,092 by S1mon_Jester 6/23/11 12:52:32 PM Reply	Re: Grid views -- thoughts My answer is "it depends". What's the size of the data? Is it enough to go to the browser and allow it to store/sort it (usually not). Do you have enough memory to store it at the web server layer? (Sometimes yes, sometimes no) Do you have an N tier? An Application Server between the Web Server and the DB that can store it there? Finally, then look at storing it in the DB. Is the data static or changing? Dynamic data might force you to keep a cursor open (Not good). Static (or near static) and you can pull down the set of rows you need by providing a start/stop/ordering-filtering conditions.
Post #344,073 by static 6/23/11 5:44:21 AM Reply	And ORMs have gotten better, too. When I first encountered the concept, the app in question had two completely bespoke ORMs, side-by-side (yeah, PHP is often like that). Next job, same situation. Only the ORM was basically non-existent. So I got to write one... I learnt a lot about scaling that time. We probably could have done RI in the database; I chose not to because my lead programmer had a habit of falling into "I have a hammer" thinking. :-/ He did less damage when I corralled him with my APIs. I don't work there anymore. I think, too, that implementing RI in MySQL has gotten easier over major versions. ISTR that this wasn't the case a decade ago when I first tried it. That helps a lot when the app is still really looking after it. And I like challenging the assumptions of Java programmers I get to work with. :-) And I agree that referential integrity is definitely useful in a DB that runs behind multiple apps at once. Wade. Static Scribblings http://staticsan.blogspot.com/
Post #344,084 by jay 6/23/11 10:04:17 AM Reply	DB does have one key advantage here Similarly, I often found that handling referential integrity for a one-app database was handled better in the app, not the DB. Why? Again, because the app knows why the reference failed, the DB only knows that it failed. And handling recovery still requires the app to know what's going on. Again, for a PHP app, it was so much easier to do it in the app and not tell the DB. The key advantage of doing this in the DB is that the DB can enforce it. When your working on a big project this is very valuable. Over the years I've spent a lot of time finding and fixing code that is almost but not quite properly transactional. Projects that enforce RI at the DB level generate fewer such problems. It's rather like an enforced coding standard. If the system is small enough that the programmer can keep the entire thing in mind then you don't need elaborate protocol for code safety and clarity. The bigger the project the more important it is that the project have an organized system and design. Jay
Post #344,091 by drook 6/23/11 12:41:34 PM Reply	This -- Drew
Post #344,094 by crazy 6/23/11 2:57:46 PM Reply	As scott said Do it in both. DB to make sure all non app accesses don't fuck it up, App so you can give the user real error messages. Rather than: "Insert Transaction Failed" An app can say: "Insert failed. Found duplicate record (same full name at address). Contact Betty in accounting to see if they owe us money before taking the order"
Post #344,129 by static 6/24/11 12:11:36 AM Reply	Yes, this is where my preference is going now. I would still focus on getting it right in the app first. That is very likely my over-compensating for developers who put it in the DB and don't even think to teach the App what to do when it blows up! :-) Wade. Static Scribblings http://staticsan.blogspot.com/
Post #344,143 by crazy 6/24/11 1:04:48 PM Reply	There comes the moment when the constraints get dropped Either on purpose (bulk loads drag on too long or some such) or by accident (DBA doing a reorg moves everything to a different system/tablespace and a few lines of the constraint build script fail and the error gets missed since it is a simple many lines script). Be ready.
Post #344,190 by static 6/25/11 2:15:16 AM Reply	That's exactly why I want the App to know what to do! :-) Static Scribblings http://staticsan.blogspot.com/

Welcome to IWETHEY!