Jacksonville Developers User Group

Learn new things...write better code.
Welcome to Jacksonville Developers User Group Sign in | Join | Help
in
Home Blogs Forums

Eugene Chuvyrov

Hello, world! and DLINQ explorations...

In my first ever blog post, I want to go ahead and get the mandatory milestone of any introductory programming course out of the way.  So, here it goes:

Hello, Blog World!

With that monumental accomplishment out of the way, I can focus on the subject of my recent mini-presentation to the Jax Architects users group, namely DLINQ and its potential impact on .Net application architectures.  The basic idea of my presentation, perhaps a bit stretched, was that with DLINQ and anonymous types in C# 3.5 it is easier and more appropriate to work with denormalized database tables in scenarios where high performance and availability are of topmost priority.  But let's start at the beginning and see how we can arrive at this conclusion...

LINQ (Language INtegrated Query) is the newest feature of .Net 3.5 Framework.  .Net Framework 3.5 is available as a separate download or as a part of Visual Studio 2008 (in Beta 2 as of this writing).  Without rehashing Microsoft's documentation, LINQ simplifies working with data, making querying and filtering lists of data a snap.  LINQ to SQL (also known as DLINQ) is a subset of LINQ and it allows easy access and manipulation of data inside SQL Server databases (only SQL Server databases, however, as of VS 2008 Beta 2).  Although many people refer to DLINQ as an implementation of Object-Relational Mapping (ORM) for .Net, I would say that a true Object-Relational Mapping mplementation would make the software developer completely oblivious to the existence of the database.  A true ORM allows a developer to create and use objects in code that automatically end up in the database without extra effort or knowledge of the database layer.  Since DLINQ does not do that (rather, it creates attribute-based mappings of SQL Server tables to objects in the special 'DataContext' conduit class), I would refer to DLINQ as the 'enabler of strongly-typed access to database data'.

To take full advantage of LINQ (and DLINQ), some prerequisite knowledge of new features in .Net 3.5 is required.  Those features are (see the attached presentation for my attempt to graphically explain them):

1.  Anonymous Types
2.  Object/Collection Initializers
3.  Extension Methods
4.  Query Syntax
5.  Lambda Expressions

Let's take a look at each one of these features in some detail.

1.  Anonymous types, in my opinion, is the most radical, powerful, and unusual feature in .Net 3.5.  In essence, an anonymous type allows us to define/shape strongly typed objects "on the fly."  For example, given the following declaration

 var strIAmAString = "I really am a string"

the variable strIAmAString becomes of type string, with all of the string methods invokable on this variable and available in VS 2008 via IntelliSense after its declaration.  'Anonymous Types' is Microsoft's reference to what has been known for a while as "duck typing" in other languages, such as Ruby.  Duck typing is best explained by paraphrazing the old saying, "if it looks like a string, acts like a string, then it must be the string."  The 'var' declaration above indicates that the type of the variable that follows will be derived based on its initialization, which must be present during the declaration of that variable.  Anonymous types cannot be used as method parameters, however, since the compiler needs to know the types of method parameters during the compilation time.

The real power of anonymous types lies in the fact that we can declare custom objects "as needed," use them and throw them away.  This feature comes particularly handy when coupled with strongly-typed access to database (DLINQ), where, at any moment during the program flow, we can go directly to the data tables and construct custom objects containing the necessary data. The example below creates a new set of objects, AmericanAuthors, each containg properties for the first and last name:

 var AmericanAuthors = from a in dt.Authors
        where a.CountryOfBirth == "USA"
        select new
        {
            AuthorFirstName  = a.FirstName,
            AuthorLastName = a.LastName
        }

2.  Object/Collection Initializers is a fancy name for the short way to set values on an object.  For example, both of the code segments below initialize an Author object:

 C# 3.0:
 Author author = new Author { LastName = “Chuvyrov”,  FirstName = “Eugene” }

 C# prior to 3.0:
 Author author = new Author();
 author.LastName = "Chuvyrov";
 author.FirstName = "Eugene";

Nothing earth-shaking for sure, but a handy feature nevertheless.

3.  Extension methods is the reference to the ability to add new methods to existing CLR types in .Net 3.5.  In the example below, the new method IsValidEmailAddress is added to the string type (borrowed from Scott Guthrie's blog, where you could also find more details about the extension methods).  Note the usage of keyword 'this' in the parameter declaration, which glues everything together:


public static bool IsValidEmailAddress(this string s)
{
 Regex regex = new Regex(@"^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$");
        return regex.IsMatch(s);
}

After this declaration (and making sure that the method is in the namespace accessible to our program), we could use (with IntelliSense support even!) the new method in the following manner:

string emailAddress = "bill@microsoft.com"
if (emailAddress.isValidEmailAddress()) …

4.  Query Syntax is a reference to a more readable way to query data inside .Net 3.5.  In other words, it's a feature that enables writing LINQ queryies without requiring a PhD in Computer Science.  You don't have to use query syntax when writing LINQ queries, but stick with Lambda expressions and Extension Methods (described below).  Consider the following two examples, which produce equivalent results:

Without Query Syntax:
IEnumerable<AuthorArticles> results =  dt.AuthorArticles.Where(aa=>aa.LastName.StartsWith(“Chuvyrov”))

With Query Syntax:
IEnumerable<AuthorArticles> results =
  from aa in dt.AuthorArticles
  where aa.LastName.StartsWith(“Chuvyrov”)
  select aa;

Query syntax makes querying for data a lot more SQL-like and a somewhat more readable.  Without Query Syntax, an understanding of Lambda expressions is a must, and I will briefly cover Lambda expressions in the section below.

A few more words on the structure of Query Syntax: every query expression begins with a "from" clause and ends with either a "select" or "group" clause.  The "from" clause indicates what data you want to query.  The "select" clause indicates what data we want returned, and what shape it should be in.

5.  The easiest way to conceptualize Lambda expressions is to think of them as ways to write concise inline methods.  Or, you can think of them as shorthand for anonymous methods introduced in C# 2.0, but I prefer "inline methods" myself.  The attached presentation contains an example of writing out Lambda expressions as anonymous methods. Here, I would only like to emphasize that Lambda expressions would always have three parts:

 -the left part indicating the object accepted as parameter (note that this could be derived, or "guessed", by Visual Studio 2008, so you don't have to supply the type of this object)
 -the '=>' separator just to confuse us normal folk; its sole purpose in life is to separate the left part from the right part of the Lambda expression
 -the right part that actually evaluates some programmer-defined expression and returns a value

If you need step-by-step instructions on modeling databases using LINQ to SQL, please refer to Part 1 of Scott Gu's blog:

For the fear of this first blog post running way too lengthy (too late, I realize!), let me get to the point already:

With the availability of anonymous types and other tools in C# 3.5, should database always matter to .Net Developers?  Should developers always blindly follow Codd's normalization principles ("The key, the only key, and nothing but the key, so help us Codd")?  Should architecture of any data-dependent processes always have a separate "database design" phase?

Let's say we have a standard e-commerce application, where customers shop for different kinds of products.  We may store different types of products in different tables in relational DBMS, and each product may have specific shipping instructions, customization properties, etc.  Imagine all of this data being stored in relational database while thousands of customers are hitting our application.  Wouldn't it make a lot of sense not to have to perform a dozen joins simply to get to the data for a single customer?  And while we are at it, wouldn't it be cool not to ask five different programmers to go to their cubes and build a zillion stored procedures gathering all these data pieces while you sit and think about the best possible abstraction layer for all of the SQL code they will write?

What I am getting at is that, perhaps, in certain situations where performance is paramount and the number of joins required by the RDBMS is excessive, the use of relational database may no longer be appropriate.  The DLINQ technology could become the enabler of the easy querying of in-memory persistent objects, or some sort of denormalized database structures.  Some authors on the Internet already argue that RDBMS is dead, or at least dying.  For example, Todd Hoff of http://highscalability.com/ states:

“The problem is joins are relatively slow, especially over very large data sets, and if they are slow your website is slow. It takes a long time to get all those separate bits of information off disk and put them all together again. Flickr decided to denormalize because it took 13 Selects to each Insert, Delete or Update. “

There are several ways we could about denormalizing the data storage:  some advocate putting all of the data into a single BLOB field in the database, and have separate database columns for fields that are searchable.  Others (like myself) think that a more appropriate solution would be a single database table resembling a giant spreadsheet with customer names, product names, etc. duplicated many times over.

Of course, there's an issue of maintaing consistency of denormalized data, which may seem as somewhat unchartered and perhaps dangerous waters at the moment.  Fear not, however, since the giants came before you, as evident by eBay following the BASE (Basically Available, Scalable/Soft state & Eventually Consistent) approach to data storage.  Still, for small shops with more limited resources and expertise, performance vs. consistency is an important architectural decision that DLINQ perhaps just made it a bit simpler.

Do you denormalize data in .Net applications on a regular basis?  Do you think DLINQ will make the dealing with denormalized data easier, more abstract?  Will DLINQ be a factor when considering the potential denormalization of the database?

Let me know at echuvyrov at msn dot com.


Resources:

ScottGu’s Blog: LINQ to SQL, 9-part series
http://weblogs.asp.net/scottgu/archive/2007/09/07/linq-to-sql-part-9-using-a-custom-linq-expression-with-the-lt-asp-linqdatasource-gt-control.aspx

Bill Wagner’s Blog on LINQ:
http://srtsolutions.com/blogs/billwagner/archive/tags/LINQ/default.aspx

Published Thursday, November 01, 2007 4:22 PM by chuvyrov
Attachment(s): DLINQ.zip

Comments

No Comments
Anonymous comments are disabled

This Blog

Post Calendar

<November 2007>
SuMoTuWeThFrSa
28293031123
45678910
11121314151617
18192021222324
2526272829301
2345678

Syndication

Powered by Community Server, by Telligent Systems