(click on the image to see other Mini-tutorials)
This project will explore, among other things, getting data into and out of a SQL Database. In fact, we’ll explore a number of ways of doing so. To get started, however, we need to create a database that is sufficient for the project and sufficiently illustrative of working with relational databases, without being overly complex.
Fortunately, the project requirements are of about the right complexity. Returning to the spec, (such as it is) we can see that we need to capture the following information each time an entry is made in a participating blog:
- The name of the blog
- The title of the entry
- The URL of the entry
- Info about the blogger (first name, last name, email & phone)
It would also be good to know when the blog entry was first created and when it was last modified.
Flat or Normalized?
You certainly could create a flat file that has all the information we need:
The advantage to this approach is that it is quick, easy, simple to understand. Putting in a few records, however, quickly reveals why this kind of flat-file database is now reserved for non-programmers creating simple projects such as an inventory of their music:
Even with just a few records, you can see that a great deal of data is duplicated in every record,
This duplication can make the database very hard to maintain and subject to corruption. To prevent this, databases are ‘normalized’ – a complex subject that boils down to eliminating the duplication by factoring out common elements into separate tables and then reaching into those tables by using unique values (foreign keys). Thus, without bogging ourselves down in db theory, we can quickly redesign this flat database into three tables:
In this normalized database, each entry has only the information that is unique to the particular BlogEntry:
- Date Created
- Date Modified
- Short Description
The entry also has the ID of the Blogger who wrote the entry and the ID of the Blog that the entry belongs to.
A second table holds the information about each Blog, and a third table holds the information about each Blogger. Thus, a given Bloggers first and last name, alias (email address at Microsoft.com) and phone number are entered only once for each blogger, and referenced in each entry by ID (known as a foreign key).
The diagram also shows that the two foreign-key relationships are named
These simple names indicate that there is a foreign key relationship from BlogEntries to Blogs and another from BlogEntries to Bloggers. The data is now much cleaner:
Notice that the BlogEntries table now does not duplicate the first and last name and phone number for each Blogger, but rather just refers to that Blogger’s ID. For example, entry #3
was written by Blogger 4 (see ID in the middle image) and was placed in Blog 3 (see ID in the top image)
Creating A Join Query (T-SQL)
You can easily write a query that recreates the complete set of information by “joining” the tables:
select BlogName, FirstName, LastName, Title, Description from BlogEntries be
join Bloggers bb on be.Blogger = bb.ID
join Blogs b on be.Blog = b.ID
This statement says to select the listed fields (it isn’t necessary to identify the table explicitly as there is no duplication). It then tells SqlServer where to find them in the “from” clause. The first table is BlogEntries which we assign a short-name of be.
We then we join to that tables the Bloggers table, but we match up the rows in BlogEntries with the Rows in Blogger where the value in BlogEtries.Blogger is equal to the ID in the table Bloggers.
You can imagine that SQL Server thus looks at the two tables and joins the matching rows, making a new (temporary) wider table:
It then looks at the Blog table and joins the Blog value to the ID in that table:
Giving us the ability to create a virtual flat table without the duplicated values.
|Advanced||Previous: Two Levels|