Importance of developer data – level 200

Most of my experience has been with systems that require a relational database to house most of the data.  Some systems have used a combination of relational databases, files, and external sources.  Regardless of the data store, there are different ways to set up the data stores in a developer environment that affect the ease and speed of development.  For clarity, I am talking about the database that is used by every developer on the development workstation (not a development team staging server)  I’ll start with the worst way.

Use last month’s production backup (BAD)
Assume for a moment that there is not a single corrupt field in your production data.  Even so, the data set can be so large that it is tough to see clearly what is going on while developing locally.  While working out problems in the program’s behavior, it’s necessary to peek into tables to see what data is there, and with a full production set, the developer has to formulate queries merely for narrowing the scope of the data to view.  Using a full production set bogs down development with unnecessary overhead.  Finally, it will be very hard to create repeatable tests since the developer is constantly changing the data.

Create a development database with fake data and put it on a shared server(BAD)
The “shared server” part of this one is the killer.  Imagine running a scenario using this database, and the data changes.  With a shared database, this is what happens.  Multiple users changes the data in many different ways.  A high degree of communication overhead and waiting is required to make this work.  The fake data will need to be refreshed periodically, so it becomes necessary to stop the team from using the database during the data reload.

Use a local database with nothing in it(BAD)
This is probably an obvious one since the system probably won’t function without a proper data set.  If it will, security or configuration problems might lurk in the shadows (if you have security or configuration tables). 

Use a local database created with the local build and populated with representative data that is enough to support every use case in the application(GOOD)
In your local build script, have a target that creates the database locally from scratch.  Have a routine that pumps data into this local database.  In this data set, include enough data so that every use case in the application can be exercised.  As functionality grows, add to this data set proportionally.  Structure your automated build in such a way that it is easy to:

  1. Blow away the database and recreate it from scratch.
  2. Blow away the database and refresh it at any time.
  3. Create multiple local databases in the same way with different names.

Your automated database tests are going to be blowing away and reading data in the normal course of execution, so it’s important to be able to refresh the database at any time.  Using this technique, you will be able to run the system locally at any time and use it in a representative manner (and debug if necessary).

WHAT ABOUT A TESTING DATABASE?
That’s a different animal.  You want the testing database to make bugs obvious.  Another tricky thing is that there isn’t just one testing database, so that’s a topic for another time.