Thursday, November 29, 2012

Database Design: Part III

As promised, I'm back.  Part II covered scheduling and estimating our sample project.  What I didn't cover is the final part of this project, the meat of this example where the customer keeps their account information. I'm going to expand the overall web map to add in the new edit account interface:

The assumption here is that the "Select Customer" web page is the same page used for both purposes.  Nothing in this diagram indicates that it is the same web page and it can be built as two distinct pages, but any opportunity to reduce code should be looked at.

Now the schedule and estimate need to be updated to account for these two interfaces and I'll leave that for another day (as well as the actual web page design for the "edit account data" web page).  What I want to do right now is show that there is more than one way to skin a web, I mean build a web site.

For instance: Should we use a select web page, or just include a drop-down at the top of the edit page.  If a drop-down is embedded on the same page as the data is displayed, then a decision must be made over what to do if there is no data in the system and what to display in the details before an account is actually selected.  So we just display the first persons data?  Maybe we should stuff a "-- Please Select a Customer -- " item in the drop down that indicates that no data is displayed until the user selects an actual user from the list. 

What about the customer edit page?  If we use a drop-down list on this same page for the customer name, then how do we handle changing the customer's name?  What happens if the customer list grows to thousands of customers?  Duplicate names possible?

This type of pontification should occur any time you create a design.  You should make a list of notes for everything that comes to mind while you are designing and when you review a design.  When you are done with your design, you should have a huge list of questions that need answering.  Answer these questions before starting any code.  Have your best programmer look over your design and make him/her ask questions.  Run it past marketing/management, see if they have questions.  Get answers before starting to code.

Next, you need to ask yourself questions about your database design.  Do you want to have records deleted from the database when the customer is deleted?  I'm assuming no.  So if we want to maintain some sort of history, do we just flip a flag in the customer table and filter out those records in the main screens and design new screens to view history information?  Or do we create history tables and copy those records to the history table before deleting them from the primary tables.  In the first example, the data in each table will continue to grow forever.  In the second example (using history tables), there is a possibility of data loss unless transactions are used.

For any data entry web page, we need some sort of data validation.  How to handle duplicate data input.  Maybe the customer number needs to be assigned by the computer when the user is created the first time.  Then this number should never be changed.  That would prevent the possibility of duplicate numbers.  If there are multiple web users entering data into this database, then the unique customer number should be generated by the database and not the program.

Do the web pages look aesthetically appealing?  Is that important, or is this just a utility used by your company's employees?  What about access to the website?  Is this going to be used by customers, or only by the company?

As you can see the whole process of designing software can take multiple iterations.  The bigger the software package, the more iterations you're likely to encounter.  Rushing into the coding phase of any software project will typically end in disaster, or at least regrets about something that could have been done better if only the right questions were asked.

One other point here is that a software package can be "grown".  If you choose to grow an application, you must first decide what the most important features are going to be, then build those first.  Multiple iterations can be used to expand the features of your software using this technique.  If you've never heard of this technique I would refer you to SCRUM extreme programming.  Here's the wiki page and the graphic in this page pretty much sums up what I just described.

That's all for now.  I'll get expand on this subject in the future.

Doing Research

As an IT manager it's my job to stay on top of things.  It's impossible to know everything about the computer field, so pick your battles.  I usually research subjects that I believe will improve my company standing among our competitors.  Of course, I keep tabs on my competitors, everybody should.  Otherwise, one day, you wake up and your customers are running for the door and you suddenly find out that your competitor has launched a killer product based on a technology you never heard of and it has a coolness factor that attracts your customers like flies.  Oh yeah, anybody that has been in the IT business for long has experienced that sinking feeling before.  That feeling, when you suddenly realize your company is at least 6 months behind your competition.

To avoid this problem, I like to keep tabs on my competition and do a lot of research in my field.  What am I talking about?  One example is knowing what is out there.  My current company surveys roofs, collects raw data, then produces budgets and construction plans.  That's the short-short version of what we do.  In reality, there are a hundred little pieces that are performed from step 1 to step N.  One of the "features" of our software that our competition is yet to catch up on, is the dynamic building viewer interface.  Our customers can log into our website and view their roof information visually.  The roof areas can be color coded according to data points that they choose.  Similar to what ARC View does.  The way we do it is we parse AutoCAD data into our database and form the correct polygons, text, symbols, etc. and they we use this data to reform the output in PDF format (for reports) and using a viewer.  In the past Adobe provided an SVG viewer plug-in and we ran with that for a couple years (until Adobe announced they were ceasing support for the plug-in and FireFox still did not have SVG capabilities).  So we stumbled onto Microsoft Silverlight.  We've been using Silverlight for about 4 or 5 years now and we've been lamenting the problem of plug-ins not working on the iPad and iPhone.  Which leads me back to my original title: "Doing Research."

I initially looked at HTML5 canvas.  I started a test projects to see what features can be ported from our Silverlight code into the new canvas.  When I discovered that Silverlight produced a static output (i.e. the lines, ellipses, squares and polygons don't exist after they are drawn into the buffer) I knew I had a problem.  Then I re-visited SVG, which to my delight is available on Chrome, IE 9, FireFox, Opera and Safari (not to mention Safari for iPad and iPhone).  There is an additional benefit, no plug-in required.  So it's back to SVG!

Another example is the ability to import and export data into our database.  Excel is one of the obvious choices and we use the Apose product for excel input and output (this is a natural choice since we are using C#).  One of our newest enhancements is regarding our bid forms.  We have customizable bid forms for roofing contractors to print out.  Typically, a contractor will receive a user id and password from our company when they are invited to bid.  They log into our system and type their bid prices into the project that we have setup.  Then they submit their bid (which locks out their ability to change anything in that project from then on) and they can generate a bid form in PDF format.  Aspose also produces a product called Aspose Words.  This is an MS Word import export library that can also produce PDF output.  So we have created a WSYWIG interface allowing our designers to customize the formatting of our bid form and use special tags to insert real data into the text.  The roofing contractor clicks a link that runs the Aspose Words library to generate a PDF output of the data that was previously uploaded by the designer using MS Word.  If you're interested in these third-party libraries you can go to for more information.  They make Java components as well as .Net components.

Another product I'm researching is iText.  This is a PDF library.  Our reports are written directly in C#.  I've used a lot of report generators in the past, many different versions of Crystal Reports and I have to admit that they are lacking.  I see the appeal, but my department tends to write very complex reports and it seems to go smoother using a PDF library that we can just paint onto rather than trying to get around a rigid report generator.  We are currently using PDFLib, which has served us well in the past, but iText has some features that make things easier (like mixed styles in one paragraph).  iText can be found here

This is but a small sample of the type of research I do on a weekly basis.  Sometimes I'll stumble across something that I'll assign one of my programmers to research.  Typically, I'll make them write some simple demo program to see how feasible the product is and how we can incorporate it into our product.  Sometimes weaknesses will present themselves.  Sometimes I am sufficiently amazed at the technology to suddenly change direction of our product (which normally occurs after I "sell" the idea to marketing/management).

So never stop researching.  Don't get left behind by your competitor!

Book Recommendation

If you're looking for a good book on dealing with legacy code, "Working Effectively with Legacy Code" is my recommendation.  This book is organized in a way that allows you to identify the problem you are having in converting your legacy code into unit testable object oriented code.  Each type of difficulty has it's own section and you can skip around, or as I've done, read straight through.

The entire theme of this book is that legacy code must be modified or refactored to work with unit tests.  The reason for incorporating unit tests is to make sure the code is verifiable when completed.  Verifiable code is easier to modify later because the unit tests can be re-executed after the modifications to make sure the original functionality still works as expected. 

The author starts with a legacy code change algorithm (copied without permission):

1. Identify change points.
2. Find test points.
3. Break dependencies.
4. Write tests.
5. Make changes and refactor.

Then he proceeds to provide chapters for performing these tasks in situations that a programmer might run into if they are attempting to modify legacy code.  An example is: Chapter 9, I Can't Get This Class into a Test Harness, or Chapter 23, How Do I Know I'm Not Breaking Anything?  Those two examples are part of breaking dependencies.

If you're interested in this book, I would recommend going to amazon and thumbing through their "look inside" feature (click here).

Sunday, November 25, 2012

Database Design: Part II

This is the second part of my software design article.  I mentioned in my last post that I'd do some coding.  Before I get to that part, I need to explain a little more about "why" we should build software in the first place.  There are only two reasons why a business should build software:

1. It adds value to the company product.
2. It makes the company more efficient.

There really is no other reason to build software.  Now you're probably thinking, "Gee, a program I built last year makes Jill's job easier."  Easier is nice and it can be quanified as making the company more efficient, if it reduced the amount of time it takes Jill to perform a task, or it allows Jill to perform more work in the same amount of time. 

All right, so this is all Econ 101 stuff, but I thought I'd point it out to remind everybody why software is built in the first place.

Next, I want to remind everyone that building software costs money.  It doesn't matter if the person building the software is already working at the company, or if they are full-time saleried employees, it costs money to pay that person to write code.  So there is a cost associated with every piece of software built.  Also Econ 101 stuff.

In the first part of this blog post I dug right into a software design.  Technically, this design will only be performed if someone in charge of the company approves of it.  If this is the idea of an IT person, then that person needs to get "buy-in" from management.  Then the detailed design needs to be completed, and finally an estimate and maybe a schedule must be created.  Keep in mind that management is going to want to know how many programmers will be dedicated, what hardware and software needs to be purchased, how many total man-hours are going to be expended and when will it be completed.  This will boil down to a total cost in dollars and a delivery date that can determine if this project will be delivered in time to be of use to the company.

OK, so how do we develop an estimate?  I've read a lot of books on estimating software and some books get into intricate details of estimating how long it takes a programmer to write software.  My experience is it depends on how well you can break the problem into definable chunks.  If the chunks are small enough, then a rough guess is good enough for each chunk and any error in estimating should come out in the wash.  Also, it just takes years of practice.  Knowing the ability of your programmers is good and understanding which algorithms are going to be complicated and time consuming and which ones are going to be easy is just good ole experience. 

So let's do an estimate for the parts we've already designed.  I know that this project is trivial and it could be built by any seasoned programmer in a day or so, but I'm going to "pretend" like this is a big production and we are going to employ some resources to building the software.

There's the estimate.  In reality, as I said earlier, it would probably take a couple hours to throw something trivial like this together, but I've made it into a big production and rounded each task up to the nearest hour.  In more complex applications you can break down each web page or screen into multiple tasks.  You may also notice a line for integration testing.  When everything is complete, you'll need some extra testing time for testing how the web pages interact with each other.  If you prefer, you can add in beta testing.  How much beta testing you perform will depend on how complex the application is and how high you want the quality to be.  If you're using test driven development, then each web page should be mostly tested before they are integrated together.

Anyway, now we have an estimate.  Next, we need a delivery date.  In this trivial example, we estimated 13 hours total, so that equates to two work days.  Or one work day for two people, assuming it can be coordinated between two people.

This project will not require additional hardware (I'm making this assumption) or software.  If you were working for a startup company, and they don't have any computers yet, you'll have to estimate the cost of a web server, the cost of the developer's work station(s) and the development software that will be used (i.e. visual studio or equivalent).

There is still one piece missing.  Your schedule.  If you're going to use a gaant chart, you'll need to decide which tasks can be performed in parallel and which tasks are critical.  In this project I'll assume that the main web page must be completed before any other page can be created.  In reality, any web page can be created in a vacuum and then they can be connected at the end.  But that would be a boring gaant chart and I wouldn't be able to show an example.  I'm also going to assume one programmer will spend two days on this project.

OK, now we have a simple gaant chart.  If you have Microsoft Project 2010, you can copy data from excel into Project and link the tasks together.  If you're using viso or another project management tool, you might need to re-type your tasks into the software.  Either way, this tool is what management will want to see.  It shows that this project will start on the 26th and end on the 27th of November.  It shows that the critical path includes every task in the project (some projects can have multiple paths and the longest path will be the critical path).  This chart shows that the previous task must be completed before the next can be started.  If there were two people working on this project, we might restructure this project to look something like this:

I have used P1 and P2 to designate programmer 1 and programmer 2.  Microsoft Project has a more sophisticated resource management capability, but as I've mentioned before, this is a trivial example. 

If you look closely at the second gaant chart, something interesting has happened here.  It will still take two days to complete this project.  But, only programmer 1 will be working on the final task on day 2 (and technically, he/she will only need half a day to do it).  This is why a CPM schedule can be helpful in determining schedules.

That's all for now.  Next I'll get into some details of what questions you should be asking as you build an application like this.

Saturday, November 24, 2012

Database Design

I'm going to design a database as an example.  This is a very tiny project, but I'm going to use it as an example of what to do and what not to do when designing database connected software.  On with the story problem...

You work for a new bank.  Your job is to write a program that tracks saving accounts for any number of customers.  I'm going to make this painfully simple and ignore stuff like interest and user authentication.  So don't expect to use this in the real world.

The first step is to decide what data to store.  Let's focus on the customer information.  For this example, we'll design one table with the customer name and a customer number: 

You can use any field name conventions you want, I just gave each field the smallest name possible so the queries are smaller.  At this point you might be wondering what the "pk" field is used for.  If you attended any database design course, you'll recognize the primary key field right off the bat.  The purpose is to designate a field in the table that is guaranteed to be unique, no matter what.  This can come in handy when we want to delete just one record since this field will not depend on user input and will never by viewed by the end user.  Only the software will use this field.  This will also be the field that will be used to connect to other tables.

Next we'll need a table to store the checking and saving account information.  I'm going to make this really simple, but it will not be very useful in the real-world.  Later, I'll expand this database to add features to make it more useful.  For now, we're going to need an account table:

OK, this table contains a primary key too.  This primary key is unique for this table only and can be used for the same purpose as the primary key in the customer table (to refer to or delete a unique record).  The "customer_key" field is what will be linked back to the customer table's primary key.  This is called the "foreign key".  I usually index this key because queries are going to join these two tables together using this key.  One example query will be:

    SUM(amount) AS account_balance
    account.customer_key = AND = 'joe'

This query will get the balance of the account for Joe.  Now we can design the integrity constraints for this database.  Obviously, we're going to join these two tables together at some point.  We also don't want the possibility that there are account records existing for a customer that is deleted from the database.  So we'll need a one-to-many relationship between the primary key of the customer table and the foreign key of the account table:

As you can see from this diagram, each customer primary key (pk) can have zero or more associated records in the account table.  The associated records will have a matching number in the "customer_key" field.  In addition to putting a constraint on these tables I would recommend adding an index to the customer_key foreign key in the account table.

Now it's time to build some software.

One of the major considerations when designing an input screen for this database is that the customer must exist before you can create account records.  So the first screen will need to be a customer select screen before entering the customer account information.  Before a customer select screen can be tested, we'll need some data to display.  So it makes sense to design the customer add and edit screen first.

The main screen will look like this (I'm designing this as a web site):

The screen to add or edit a customer record will look like this:

When we edit a customer record, a customer select record screen must be used to get to the edit screen shown above:

OK, so now I'll diagram a map of the web site as it exists now:

Now you can see how these pages are all connected together.  At this point, there is enough design information to build enough of the web site to be able to maintain user accounts (minus the capability to delete an account).  These web pages can be used to populate the customer table only.  Constraints on the database, at this point, will not affect any software that you write (assuming you auto-generate the unique primary key for the customer table).

I'm going to end this blog post for now, but in the next exercise, I'm going to do some coding, and then I'm going to move on to the account portion of this project.  When this project is complete, I'm going to ask a few questions that you should be asking yourself when you design software.  I'm going to basically, critique my own system and show that there are many considerations that can be taken when designing even the simplest program.

Designing Database Driven Applications: The basics

I'm going to talk about the basics of designing an application that connects to a database.  I'll assume you already have an interface idea and some notes on what to store in the database.  I'm also going to assume this is your first experience (i.e. for those who just came out of college or this is a senior class project of some sort). 

The first step is to design the tables that will store your data.  You'll need to draw an Entity Relationship Diagram (I'm betting you thought you could just forget that information after your database class final... LOL!)  Once you're confident that you have all the fields and tables designed you'll need to decide how the integrity constraints are going to be setup.  This is more important than the actual data fields.  If you forget a data field and have to add it later, it will not affect your code as much as a mistake in connecting two tables together.  So focus your efforts on getting the constraints correct!

After your relational integrity constraints are designed, you can create a test database.  Create all your tables and make sure you create all the constraints.  You can add indexes at this time, or leave them off until the end.  You'll need to analyze your data once you're done with your software in order to determine if you need additional indexes anyway.  Normally, I'll index all foreign keys and any fields that are specifically used for sorting (like fields I named "sequence" or "order", etc.).

Now you can design and build the software.

OK, why did I stress this ordering of events?  My experience is that programmers will make mistakes (yup, it happens).  The best way to make sure that the final software will not crash because of an integrity error, is to create a database with the constraints first.  Also, it's easier to change the program while writing the initial copy, than to add the constrains after all the software is written.  Which will require you to redesign the software to make it work with the added constraints.

Make sure you have all the foreign key constraints you need before finishing your project.  If you miss one, it will show up later when you query your data and records drop out, or you total records and end up with a larger number of records than you see on your program's list (these are known as orphaned records).

I'm talking about this subject because I have experience with it.  I inherited a database containing more than 400 tables designed by a person with no formal database experience.  This database did not have any integrity constraints.  When I ask the "programmer" why he didn't put any constraints on his database, he replied that he tried, but it just broke his software, so he left them out.  Apparently, he saw no reason for constraints, I slapped my head and decided that I better check the data.  It didn't take long to discover all types of data integrity problems.  Attempting to add constraints and fix the software at that point was an uphill battle and the entire system was rebuilt from the ground up.  I wouldn't recommend rebuilding an application because of missing constraints, but the lack of knowledge in database design also translated into a lack of knowledge in software design.  I only wish that he could have spent the time to get the database part right so I could have redesigned a front-end to connect to the existing database and not worry about constant database problems showing up as bad customer reports.

Maybe I'll do a subject on basic database design and stress the important stuff that is not taught in colleges or tech schools.

Microsoft Surface

So Microsoft has a new toy (announced back on October 26th).  It's called the surface (see here for more info).  As an IT manager, it's my job to find out about any potential paradigm shifting devices on the market.  Sometimes it's a joy, sometimes it's a headache.  I haven't decided which type of device the surface is (joy or headache).  I have determined the following basic information about the product that makes me scratch my head:

There are two models, one with Windows RT and one with Windows 8.  Initially, I thought that they were both the same hardware that could have two different OS's installed.  Nope.  They are different CPU's and the RT model cannot run Windows 8 programs.  The Windows 8 version will have slightly different dimensions (It will be thicker and heavier) and will not be available until after Christmas.

For specifications on the Windows 8 version I will refer you to the CNET article here.  This article mentions that it will ship early 2013.

My initial reaction that Microsoft was going to produce a tablet got me a bit excited.  I assumed that the reason Windows 8 had that added touch device interface was to compete on the smart phone market.  But a touch pad that can run windows apps?  Why that was like a thin laptop.  As soon as I found out that there was going to be a version called RT, I was a bit confused.  I suspect that confusion is what will happen to a lot of people who expect the new tablet to run windows apps and find out it doesn't.  At least, not the one they are selling before Christmas.  I understand the whole marketing angle behind getting it out the door before Christmas.  My issue is the fact that somewhere down the road there will be a mixture of these two tablets floating around and many people will not realize the difference between the tablet that they own and the other tablet.

To compound matters, as a software designer, I must make a choice going forward.  Do I create my apps to run on the iPad because it's currently number 1 and take the chance that it'll remain the market dominant tablet?  Do I write software to run on Droid devices and incur the pain of software that must adapt itself to a large variety of hardware with different features?  Or do I write my software to run on the surface, and which surface?

One other note: The RT version runs with apps from the Microsoft Windows Store.  I suspect the selection is currently small.  Maybe people will port their windows applications to RT and make them available in the Microsoft Windows Store.  Only time will tell.

That's my two cents.  I hope people take the time to research what they are buying.  Impulse buyers beware!