use continuous integration, but what about the customers?
We continue to integrate more and more of Extreme Programming in our development process. One of the things we’ve found particularly empowering is continuous integration.
- don’t branch your code
- only commit things that work
- in order to do a bigger project, break it into incremental pieces so it works the whole time
We’ve also added a requisite code review by another team member for anyone committing to our repository.
By doing this we don’t spend time doing
- we avoid time-consuming integrations
- we avoid a long QA process
- we avoid “throw the first try away” instances (sometimes this is still the best way to go, but it’s no longer the default)
- Most importantly, we force design to occur earlier rather than later.
All of this has improved the overall quality of the commits to the codebase and the thought that goes into them. We have fewer patches that are more likely to fix/solve the intended problem. In all aspects of the process, it’s been welcome to the developers, the maintenance team, and most of all the customers.
Unfortunately, we’ve been unable to get to a fully continuous deployment process. The quality and smoothness of the deployment hasn’t reached a point where we can simply apply the latest changes from our repository to our production servers. The problems we’ve found are:
- changes from the development team are very well tested and likely working, but we’re not completely sure
- necessary changes to the system outside of the code
- configuration file changes
- database schema changes
- additional daemons/services
In order to alleviate these problems, we still apply patches to our production code, which one might call a branch, but I prefer to call it a projection of our codebase. Since it doesn’t evolve on its own, rather only a subset of the commits to our repository. How can we bridge this gap? Is that the job of the QA department?
[tags]xp, continuous integration, qa[/tags]

February 13th, 2006 at 7:17 pm
Thought I’d chime in here as well. =)
> We’ve also added a requisite code review by another team member for anyone committing to our repository.
That’s an awesome policy. Are you using CVS/Subversion? Any Trac-esque changeset viewer?
As a coder I really like the fact that someone else looks at your work.
When you’re just writing code for yourself, I tend to leave a lot of not as clean as possible bits in the code. Sometimes, I go back and clean it up. But not always.
If I know for sure someone else will be reviewing it, I’ll go the extra effort and make it really clean, standards-conforming and all that.
Why, though… is it almost uniformly true that when you go back and look at code you wrote 1-2 years ago, you always wonder what you were thinking?!?! Is that just me?! Maybe 10 years from now (hopefully not still coding by then), I could look back on code a year earlier and it would be 100% top notch. Ya, right.
February 13th, 2006 at 9:48 pm
We do use Subversion at this point. We were using CVS until the end of last year when we decided to transition.
The code review process combined with the concept of continuous integration has proven really powerful. We’ve been able to do some of those “major refactorings” that one always does when hindsight kicks in and you realize there was a better way to solve the problem at hand (sometimes 1-2 years later, sometimes 2 weeks later). By forcing these refactorings into smaller steps, with unit-testing in between, they’ve gone from the most error-prone activity in coding to almost guaranteed success.
On the flip side, I’m noticing less disgust with code written by me long ago. First, because I’m solving similar problems repeatedly (web application design, scalability, access control, MVC separation, etc) the solutions are iteratively better. Second, by doing solid OO design and practicing refactoring, it’s much cheaper/easier to tweak something I wrote a while ago into whatever I need today. The real difficulty is trying to go back and work with code written for the older version of a language. For example, working in the OO model of PHP4 is truly an exercise in frustration after working with PHP5.
February 20th, 2006 at 10:13 pm
[...] But, as I posed the question in my previous article, we have a bit of distance to cover. [...]
February 23rd, 2006 at 10:06 pm
Configuration files and database changes *are* programmable and testable.
For database changes, make a mysqldump to a data file that has the current production database structure. In another file, add some INSERT statements that represent a good set of test cases.
Now, write a SQL conversion script that migrates the current schema to the development schema.
Put both of these scripts in your setup() portions of your test suite. Use your test versions of configuration files to run these against a dedicated dev server or a local mysql instance.
Once released, you can make a new base mysqldump structure.
As for configuration files, write a test script that asserts the validity of a configuration setup. Run that as you deploy.
===
For continuous integration, the goal is to make check-ins simple, small, and stable. To that end, I like to run test coverage reports against the codebase. (Not sure if this exists for PHP4/5, but I would think it has to by now.)
Then allow the coder to forego the code review if she has met a threshold coverage. (Say, 80%?)
Test-first development goes a long, *long* way to making code reviews mighty quick!
February 23rd, 2006 at 10:22 pm
This is very similar to something we’ve discussed. Here are some considerations that we’ve run into that make the automatic database schema updates impractical for an application like ours:
1. Many database tables are simply too large for schmea updates/changes without either scheduled downtime (of many hours), or fun tricks involving mysql replication to make these schmea updates without significant downtime, but are almost impossible to automate.
2. Database changes are far more of a commitment than changes to the codebase. Code is easy to manipulate, change and modify, but datab schema requires more review and consideration. Especially when creating new data tables we take great care to ensure that proper data is being stored in an efficient way, the growth and access time of the data will be accessible into the future. Changes to existing tables undergo even further scrutiny because of #1 above.
–
As for testing using the schema from production, we’ve got this going already! Everyone has a test database that on which they execute unit and functional tests to ensure they haven’t regressed and to use as part of their development process.
February 24th, 2006 at 9:31 am
(1) Ok, then you need two schemas or table namespaces for your giant tables. In the configuration files, you can switch between them.
Of course, that requires lots of db space or perhaps two production db servers that you can switch between. Similarly, you could use two code deployment locations to switch between and manage using Apache VHosts or some such.
Probably the trickiest part is coordinating with your mail servers and data changes in the deployment interim. But I think mysql has a log feature that captures all changes to a SQL script, right?
Your migration script:
1. Replicate giant_table_1 to giant_table_2 (or vice versa next time)
2. Start logging DB changes to giant_table_1
3. Apply DB migration script to giant_table_2
4. Run test suite with configs pointing to giant_table_2
5. Freeze writes to giant_table_1
6. Import the interim data to interim_table_1
7. Apply DB migration again, interim_table_1 -> interim_table_2
8. INSERT interim_table_2 -> giant_table_2
9. Install new code and configs
10. Restart procs (or whatever)
The point is, I bet the migration *is* programmable and testable. You may need to program in shell script. You may need to use wget, mysql scripts, awk, and grep to verify functional changes. But these are doable. And source-controllable.
(2) Agreed.
February 24th, 2006 at 10:12 am
One last thought - deployment is *hard*! I shouldn’t trivialize the difficulty.
Just as integration can be hard, so we strive to continuously integrate, deployment is hard so we should strive to continuously deploy.
Probably the best way to do that is to have the deployment script part of the continuous integration.
But again, I feel your pain. It’s not easy.