Question on DB "best practices"

Post #344,034 by drook 6/22/11 3:39:27 PM Reply	Question on DB "best practices" I know I'm right, but the DBA is adamant. Here's the question: Is it ever appropriate to execute a delete and a select in a single stored procedure? More generally, is it appropriate to implement business logic in the DB (via stored procedures and/or triggers) or should the DB be treated as a dumb -- but highly-optimized -- data store? Assuming the response isn't overwhelmingly against my position on this, I'll need some references to point to to make the case. -- Drew
Post #344,035 by folkert 6/22/11 3:48:23 PM Reply	We don't even... We don't even use relational parts of "relational" DBs. All business logic is in C/Perl/Shell Absolutely ZERO functionality stored in the DB. It makes it MUCH more portable and MUCH faster to optimize the tables, if you were to delete a ton of records or bulk add 1M records. Indexes are the only thing in the DB besides the data. I used to think Oracle and Stored Procedures were the Shizzit... until I had to move a DB from Oracle to another DB. Then and ONLY then will you learn the Folly of Stored Procedures for Business logic...
Post #344,036 by jay 6/22/11 4:13:43 PM Reply	Design coin flip You can get more speed by pushing logic into the DB and putting relational and fundamental bits in the DB makes then easier to isolate and secure, but your application becomes bound to the DB and logic in the DB tends to be more obscure and harder to debug. From a purely design aesthetic position it's a bad idea, but in most situations, the practical benefits outweigh the problems. Can't say that I have any particular references for this point of view though. Jay
Post #344,039 by boxley 6/22/11 5:12:52 PM Reply	dumb datastore is easier to debug imho Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free American and do not reflect the opinions of any person or company that I have had professional relations with in the past 55 years. meep
Post #344,043 by malraux 6/22/11 6:36:50 PM Reply	Step away from the gun... I have some pretty extreme experience with this (email me for details). The upshot is that business logic in stored procedures should only be done when absolutely required. Triggers are horrible as well. You may say to yourself, "but Scott, it's just this simple little thing, and triggers make it so much easier". That way lies madness. Trust me. Also, as soon as you start putting logic in databases your scalability is almost immediately limited to vertical scaling, unless you are very very careful. Most developers are not that careful. Regards, -scott Welcome to Rivendell, Mr. Anderson.
Post #344,050 by drook 6/22/11 10:23:05 PM Reply	Wow, not the response I was expecting This might be an issue of definitions. I like the definitions in this article -- http://database-prog...g-definition.html -- of 1st and 2nd order business logic. It's the stuff he describes as 2nd order that I like to put into the DB. Things like, if there is a procedure that calculates the price for a product based on cost, overhead, contract price, etc., then that proc should be called (by a trigger) when an order is created or updated. It's not "business logic" in the sense of flow control or anything like that. It's more about "rich" data validation and constraints. As for "most developers are not that careful", I haven't noticed that having careless developers write bad code is significantly better than having them write bad stored procedures. -- Drew
Post #344,060 by malraux 6/22/11 11:30:59 PM Reply	Why that should not be in the DB: Before I start, some background: I've written entire rules-based automated travel booking systems in stored procedures. I've worked at places that have over a million lines of PL/SQL code. The very forums we're using incorporate stored procedures to manage the complex insertion logic of a new post using nested set trees. I'm not trying to pull argument from authority, but just understand that I've seen a few patterns and anti-patterns, and I've done things with databases that would give Bryce wet dreams for a month. Also, a caveat: I work on big, complex systems. Some of the things I'm going to mention may not apply to smaller systems. However, every system starts small. It doesn't take a lot of effort to future-proof while it's small and if you do you may reap a lot of benefit later on, or indeed be able to do things that would be nearly impossible from an organizational or resource standpoint once the system gets big. There are probably even more arguments for my position but I'm not spending a lot of time on this, believe it or not: 1) Maintenance 2) Traceability 3) Accessibility 4) Testing 5) Scalability 6) Portability 7) Functionality Maintenance: Presumably you are using a different language, such as PHP, to do what you're calling 3rd and 4th order business logic. As soon as you add a second language you're multiplying your maintenance costs. You need a new test framework, you need to add more functionality to your build system, you need resources that can program in PL/SQL or TransactSQL or whatever, you need more debuggers and tools, and so on. Multiple programming environments increase your technical and cognitive surface. Traceability: Where is the logic executing? Because you've added a whole new environment that can run logic, but not all the logic -- remember that 3rd and 4th order logic is somewhere else -- you don't immediately know where a particular piece of logic may be located. Let's say you have that bit that calculates "the price for a product based on cost, overhead, contract price, etc." in a stored procedure. Now, one of your clients requires that you calculate their price based on the results of a web service call. You had better not be trying to do that from within a stored procedure (I will kill you for every time you mention UTL_HTTP, fair warning), so now you need to move that logic out of the stored procedure and into your app tier. Maybe you have other logic for other clients as well... do you move everyone's out now? Or do you maintain price calculation logic in two different places? The next time you need to make a change, how do you know the price calculation logic is in PHP, and the state tax logic is still in a stored procedure? Or what if the pricing logic needs to look up values, or it now needs to be modifiable by an administrative person or user? If you've got logic in a trigger that is accessing other tables you're asking for trouble. Because triggers are essentially invisible you can end up with chains of them doing multiple things, all within your transaction space. How do you really know what's happening on that insert? Another variation on this point is separation of concerns: when you start using stored procedures and triggers the line between data and logic becomes blurred. Accessibility: Some databases, like Oracle, let you do an awful lot in their stored procedures. Invariably this functionality is slower, clunkier, and almost guaranteed to cause performance issues as the system grows. Or take logging. You're going to want to have logging in any procedural code you write. I've written a syslog client for PL/SQL just to get a logging solution that would perform well enough to not cause the database heartburn under load. Not recommended. Testing: Unit tests need to be fast. Very fast. They should run every time you check code into your repository, and they need to be self-contained. If you have logic in a database system, now you need to have a running database to run your tests, which means that tests for this logic automatically get moved to the component (integration) testing phase of your development pipeline. Errors get caught later as a result. And since your code resides in a trigger, you have to scrub your data before and after testing, all of which complicates your test harness. Scalability: Once you dip into a database's transactional context your costs of execution go up by an order of magnitude at least. If you can do your calculations, constraints, validations, etc. before you have to invoke a remote call with a transaction context and persistence, you're going to scale better. Leave that logic out in the app tier and it can be parallelized, optimized. A database is persistent storage. Treat it like a hard drive, and don't go there unless you absolutely have to. The most performant systems do their work in memory (this can be done in a high availability fashion if required) and write the persistence out lazily. Portability: IWETHEY is a case in point. I'd love to test it on MySQL, but I can't because the post creation logic is implemented in a stored procedure. If it were out in my app tier then switching to a different persistent store would be as simple as changing my JDBC driver and loading the table schema elsewhere. Or let's say that you decide a few years into the life of the application that parts of it would be better off in a columnar-oriented database like Vertica. Or maybe in something completely different like BigTable. If you are only persisting data with your persistent data store, you can easily swap that piece out. Functionality: Rules engines. Put your validation, calculation, and constraint logic in a rules engine and you can improve your functionality (and maintenance). Your admins can change the rules without needing a programmer, either directly via the rules themselves or with an administrative GUI. Or not... either way the functionality available to a general purpose language, running outside of a database's execution context, is much greater than the limitations imposed by a special purpose stored procedure language. Regards, -scott Welcome to Rivendell, Mr. Anderson.
Post #344,075 by Another Scott 6/23/11 6:47:10 AM Reply	Very interesting. Thanks for the detalied explanation.
Post #344,062 by static 6/22/11 11:34:05 PM Reply	I wasn't expecting that trend, either. I know when I was trying to do scale in PHP on MySQL, it was so much easier to do it all myself in PHP. Numerous people kept finding "interesting" technologies that "should have" worked, but the solutions were always at or below the SQL. This never made sense to me: the knowledge required for sharding and partitioning vanished as soon as you hit the SQL. Similarly, I often found that handling referential integrity for a one-app database was handled better in the app, not the DB. Why? Again, because the app knows why the reference failed, the DB only knows that it failed. And handling recovery still requires the app to know what's going on. Again, for a PHP app, it was so much easier to do it in the app and not tell the DB. Now that we're doing Java against the DB, I've been co-erced into doing referential integrity in the DB. The main reason is that Hibernate and JDBC seem to expect it. :-/ Wade. Static Scribblings http://staticsan.blogspot.com/
Post #344,064 by malraux 6/22/11 11:41:29 PM Reply	Re: I wasn't expecting that trend, either. That second point about the app knowing why and the DB only knowing that it happened is a great one. Referential integrity in the database is a good idea even if you have it in the app, from both a belt and suspenders point of view as well as supporting automated tools. If you've ever tried to reverse engineer a database with hundreds of tables and no RI you'll know what I mean. Referential integrity is important meta data as well as a means of data validation. I don't work on one-app databases that often so my experience and preferences are heavily slanted towards risk management and maintenance in large, complex environments. Regards, -scott Welcome to Rivendell, Mr. Anderson.
Post #344,068 by drook 6/23/11 5:17:43 AM Reply	I think I know what the problem is I'm looking at a ton of shit code, and it's duplicating functionality all over the place that does the same thing multiple shitty ways. I know that it can be done via stored procedures, which would keep developers from having to touch it. "But who would write the procs?" you say. Yeah ... good question. How about this, then, since it's my other big gripe. We've got two different apps that have home-grown grid controls. Both are based on pulling back large datasets, everything available to a user in a specific context. Then the app code handles paging, sorting and filtering. I've worked on systems with more data and more users and we had consistently faster page loads than I see on this system. We did all the paging/sorting/filtering in SQL. Is this a case where performance sucks because we shouldn't be trying to do that, or just another case that the current implementation sucks? -- Drew
Post #344,074 by static 6/23/11 5:51:15 AM Reply	Paging is tricky. I've found you need to do two things: push to the database what is easier to do in the database (filtering and sorting, usually), and don't let the DB or ORM layers buffer if you don't need it. My expertise here is web servers, specifically PHP on MySQL. MySQL does filtering, sorting and paging faster and more efficiently than in the PHP, so I expose these controls through the object interface. They get converted into the right SQL clauses as required. I also use an unbuffered query whenever I can get away with it and make it easy for the app to pull one row at a time (a generator, basically). I've seen this reduce memory usage by orders of magnitude and speed to increase substantially. I think this approach is kind of halfway between your two extremes. Wade. Static Scribblings http://staticsan.blogspot.com/
Post #344,089 by malraux 6/23/11 11:54:11 AM Reply	Re: I think I know what the problem is Paging, sorting, and filtering are what relational databases are there for in general, beyond simple persistent storage. Because you're using a portable API (SQL), the actual code controlling what the sort is, what the filter is, etc. is kept out in the app tier so most if not all of my issues with business logic don't apply. There's an issue with scaling if you're not using read-only shards. Depending on the use cases you'll also want to investigate caching. Also depending on the use cases (do they involve extremely complex filtering or reduction logic) you might be better off moving out of the database. The key is to keep it modularized to watch your separation of concerns. Regards, -scott Welcome to Rivendell, Mr. Anderson.
Post #344,090 by drook 6/23/11 12:40:59 PM Reply	Grid views, basically Customer (or vendor) gets a view of all current orders. They want sorting and filtering by clicking on column headings. It's trivially easy to create a control in Visual Studio that does a single query, then keeps everything in memory doing sorts, filters and paging on that data set. Whether that's a good thing to do is the question. We just had a meeting on this and two of us insisted that we should be hitting the DB for every sort and filter operation. I can live with keeping it in memory for paging, because it avoids the pathological case: User is on page 2, a record is at the top of page 3. Before they click "next" someone clears an item on page 1 or 2. Now I'm on page 3 and I never saw that record, which is now on page 2. Maybe that's not worth having a different mechanism for paging than for sorting/filtering. Maybe I'm just trying to find a middle ground between what we have and what I think we need, so there's less resistance to doing it. -- Drew
Post #344,092 by S1mon_Jester 6/23/11 12:52:32 PM Reply	Re: Grid views -- thoughts My answer is "it depends". What's the size of the data? Is it enough to go to the browser and allow it to store/sort it (usually not). Do you have enough memory to store it at the web server layer? (Sometimes yes, sometimes no) Do you have an N tier? An Application Server between the Web Server and the DB that can store it there? Finally, then look at storing it in the DB. Is the data static or changing? Dynamic data might force you to keep a cursor open (Not good). Static (or near static) and you can pull down the set of rows you need by providing a start/stop/ordering-filtering conditions.
Post #344,073 by static 6/23/11 5:44:21 AM Reply	And ORMs have gotten better, too. When I first encountered the concept, the app in question had two completely bespoke ORMs, side-by-side (yeah, PHP is often like that). Next job, same situation. Only the ORM was basically non-existent. So I got to write one... I learnt a lot about scaling that time. We probably could have done RI in the database; I chose not to because my lead programmer had a habit of falling into "I have a hammer" thinking. :-/ He did less damage when I corralled him with my APIs. I don't work there anymore. I think, too, that implementing RI in MySQL has gotten easier over major versions. ISTR that this wasn't the case a decade ago when I first tried it. That helps a lot when the app is still really looking after it. And I like challenging the assumptions of Java programmers I get to work with. :-) And I agree that referential integrity is definitely useful in a DB that runs behind multiple apps at once. Wade. Static Scribblings http://staticsan.blogspot.com/
Post #344,084 by jay 6/23/11 10:04:17 AM Reply	DB does have one key advantage here Similarly, I often found that handling referential integrity for a one-app database was handled better in the app, not the DB. Why? Again, because the app knows why the reference failed, the DB only knows that it failed. And handling recovery still requires the app to know what's going on. Again, for a PHP app, it was so much easier to do it in the app and not tell the DB. The key advantage of doing this in the DB is that the DB can enforce it. When your working on a big project this is very valuable. Over the years I've spent a lot of time finding and fixing code that is almost but not quite properly transactional. Projects that enforce RI at the DB level generate fewer such problems. It's rather like an enforced coding standard. If the system is small enough that the programmer can keep the entire thing in mind then you don't need elaborate protocol for code safety and clarity. The bigger the project the more important it is that the project have an organized system and design. Jay
Post #344,091 by drook 6/23/11 12:41:34 PM Reply	This -- Drew
Post #344,094 by crazy 6/23/11 2:57:46 PM Reply	As scott said Do it in both. DB to make sure all non app accesses don't fuck it up, App so you can give the user real error messages. Rather than: "Insert Transaction Failed" An app can say: "Insert failed. Found duplicate record (same full name at address). Contact Betty in accounting to see if they owe us money before taking the order"
Post #344,129 by static 6/24/11 12:11:36 AM Reply	Yes, this is where my preference is going now. I would still focus on getting it right in the app first. That is very likely my over-compensating for developers who put it in the DB and don't even think to teach the App what to do when it blows up! :-) Wade. Static Scribblings http://staticsan.blogspot.com/
Post #344,143 by crazy 6/24/11 1:04:48 PM Reply	There comes the moment when the constraints get dropped Either on purpose (bulk loads drag on too long or some such) or by accident (DBA doing a reorg moves everything to a different system/tablespace and a few lines of the constraint build script fail and the error gets missed since it is a simple many lines script). Be ready.
Post #344,190 by static 6/25/11 2:15:16 AM Reply	That's exactly why I want the App to know what to do! :-) Static Scribblings http://staticsan.blogspot.com/
Post #344,116 by lincoln 6/23/11 8:10:58 PM Reply	Well, it depends on what you're trying to do. Do you want to have your stored procedures to be nothing more than a single SELECT, UPDATE or DELETE statement? You can if you want to. They're big on that here at my current employer. But then, you might as well just execute the SQL command from your application and not have SPs at all. The flip side is companies that want the business logic in SPs. For instance, I worked with a guy who interviewed with AIG years ago and he said that back then they had SPs that ran from 5,000 to 10,000 line long. In those SPs it absolutely makes sense to have the SQL code compiled and stored in the database. "Chicago to my mind was the only place to be. ... I above all liked the city because it was filled with people all a-bustle, and the clatter of hooves and carriages, and with delivery wagons and drays and peddlers and the boom and clank of freight trains. And when those black clouds came sailing in from the west, pouring thunderstorms upon us so that you couldn't hear the cries or curses of humankind, I liked that best of all. Chicago could stand up to the worst God had to offer. I understood why it was built--a place for trade, of course, with railroads and ships and so on, but mostly to give all of us a magnitude of defiance that is not provided by one house on the plains. And the plains is where those storms come from." -- E.L. Doctorow
Post #344,123 by malraux 6/23/11 10:06:20 PM Reply	5,000 to 10,000 lines long? Amateurs. The biggest one I've seen was 15K. Once you have that much PL/SQL you really get to experience at a visceral level all of the issues I mentioned in the other post. Regards, -scott Welcome to Rivendell, Mr. Anderson.
Post #344,144 by crazy 6/24/11 1:07:06 PM Reply	I still win 32K lines in a single C header file. To be included in every program compiled. On a slowwwwww machine. No, I didn't write it. Left the jobs within 2 weeks of being told I had to work for the guy who wrote it.
Post #344,145 by Another Scott 6/24/11 1:17:55 PM Reply	Let me guess - No comments, too? ;-)
Post #344,177 by crazy 6/24/11 8:08:50 PM Reply	Don't recall If there was, they were stupider than the code.
Post #344,231 by folkert 6/25/11 8:12:24 PM Reply	I remember looking at it... When I did something for you... you had to show it to me... No there were no comment and its was serious Cut and Paste.
Post #344,229 by malraux 6/25/11 6:42:10 PM 6/25/11 6:42:35 PM Reply	Hmmm. Until you have some database code that, for every call, creates TCP sockets, makes external procedure calls (socket to process to C library, woo), builds up and tears down all of its package state (DBMS_SESSION.RESET_PACKAGE), logs megabytes of text per second, simulates synchronous calls using AQ with a 30 second timeout, stores data using comma delimited strings, XML, and UDTs, and generates HTML using string catenation in PLSQL... UNTIL then no, no you do not actually "win". And no comments, natch. Regards, -scott Welcome to Rivendell, Mr. Anderson. Edited by malraux June 25, 2011, 06:42:35 PM EDT Expand All History
Post #344,230 by Another Scott 6/25/11 7:37:19 PM Reply	:-)
Post #344,217 by lincoln 6/25/11 4:04:44 PM Reply	they were written over a dozen years ago so they've probably grown since then ;-) "Chicago to my mind was the only place to be. ... I above all liked the city because it was filled with people all a-bustle, and the clatter of hooves and carriages, and with delivery wagons and drays and peddlers and the boom and clank of freight trains. And when those black clouds came sailing in from the west, pouring thunderstorms upon us so that you couldn't hear the cries or curses of humankind, I liked that best of all. Chicago could stand up to the worst God had to offer. I understood why it was built--a place for trade, of course, with railroads and ships and so on, but mostly to give all of us a magnitude of defiance that is not provided by one house on the plains. And the plains is where those storms come from." -- E.L. Doctorow
Post #344,148 by drook 6/24/11 1:46:34 PM Reply	Amateur Even your simple select/update/delete statements should be stored procedures because then they'll be compiled and you'll avoid the overhead of query plan calculation on each call. Or so I've been told. :-/ -- Drew

Welcome to IWETHEY!