Blog

CouchRS – Creating SSRS Reports against CouchDB Views

In my previous post we looked at how to build a simple report in SQL Reporting Services using CouchDB as the backend.  While reports against a single CouchDB document might useful in some instances, the real power comes into play when you are querying CouchDB views.  (Refer to my introductory post and to “CouchRS – Building Your First Report…” for examples on how to install CouchRS, and how to configure a Reporting Services project Data Source and Dataset to use CouchRS.)

In order for us to query a CouchDB view, we need to create one.  You can edit CouchDB views directly in the Futon web interface, but I’m using “LoveSeat” – an open source editor for CouchDB.  The data I’m using is a subset of the AdventureWorks2008R2 database, loaded into Couch.  To simplify things, my object model consists of Customers, who have Orders, and Orders have OrderDetails.  Each document in my CouchDB is a Customer object.  The point here is to illustrate that you can use CouchDB to store your domain model aggregate root(s), and still perform a wide range of reporting against this data without the need for it to be stored relationally.

So, for our first view, let’s create something that returns territory and total sales information.  Each order in my object model has a “Territory” object, and I’d like to include the Territory’s “GroupName”, “CountryRegionCode” and “Name” in my view, along with the year and month of each Order, and the “TotalDue” amount of each order.  Here’s a screen shot of my “map” function for this view:

View JS 1

This function takes a document as its argument (map functions are called for each document in the database).  Then we iterate over the Orders collection for the given document and emit a record that has an array key of [GroupName, CountryRegionCode, Name, Year, Month] and a value equal to the “TotalDue” for that Order.  Note that this mapping function can emit multiple records per document passed to it.  If we query this view via a browser, we’ll see results similar to this:

Sample View Results 1

Once this view has been built and queried for the first time, subsequent requests are instantaneously answered.  We could go ahead and use this data to drive our report, but I want to take advantage of another feature of CouchDB views, the ”reduce” function.  The reduce function on a Couch view is called on the set of data produced by the view.  In our case, it can be used to aggregate the data by the key or even just a portion of the key.  This reduce function in this example reads:

function(keys,values) { return sum(values); }

This function aggregates (sums) the value (“TotalDue”, in our case) by the key.  If you’re wondering why I made the key complex, and the value scalar, it’s because I wanted to take advantage of the “reduce” (reduce functions should, as a general rule, return a scalar value).  Fairly simple stuff, right?  Later on, we’ll touch on how “SeeNoSQL” supports CouchDB’s query options, enabling us to specify a “group_level” on a reduced view, effectively allowing us to aggregate the values by partial keys (i.e. – what if you wanted see the totals for all regions by year, and leave “month” out of it?).

Now that we have a reduced view, let’s open a new report up and create a data set that points to our view:

FROM VIEW - DataSet

The command text presented here has quite a bit more going on that our easy example from my last post:

  • FROM VIEW(aw,sales_by_region_year_month) – indicates we’re querying a view.  The first argument of the VIEW() function is the name of the design document that the view resides in, and the second argument is the name of the view.
  • USING QUERYOPTIONS(Group=true) – this tells Couch to Group the results by the full key, otherwise we’d get a result set with only one record that summed all the values in the view (we have to include this because this view has a “reduce” function – this isn’t necessary for non-reduced views).  Other query options that can be used here are:
    • ALLOW_STALE (true/false) – if true, CouchDB will not refresh the view even if it is stalled.
    • DESCENDING (true/false) – if true, the results are sorted in descending order by the key.
    • GROUP_LEVEL (int) – the level at which to group the results.  We have a 5 part key, so, a group level of 3 would only group it to the third key (GroupName, CountryRegionCode, Name), a group value of 4 would group it up to GroupName, CountryRegionCode, Name & Year, etc.
    • INCLUDE_DOCS (true/false) – if true, it includes the original document that was used to emit the record in the view.
    • INCLUSIVE_END (true/false) – Controls whether the endkey is included in the result.
    • REDUCE (true/false) – if false, the reduce function in a view is not executed.
    • SKIP (int) – skips the number of rows specified.
    • LIMIT (int) – limits the number of results returned to the number specified.
  • WITH KEY[GroupName,CountryCode,Country,Year,Month] – this tells the SeeNoSQL parser that you are using a view with an array key with 5 elements.  You can call each element whatever you want for the sake of the query, as long as you understand that they are evaluated in the same order that the actual key is in.  If you don’t have a WHERE clause, the “WITH KEY” statement is not required.
  • WHERE Year = @Year
    AND Month BETWEEN(@StartMonth,@EndMonth)
    AND GroupName BETWEEN(MIN_STRING,MAX_STRING)
    AND CountryCode BETWEEN(MIN_STRING,MAX_STRING)
    AND Country BETWEEN(MIN_STRING,MAX_STRING)
    – This specifies that user-driven parameters will provide the value for Year, and a range for month.  For the sake of this report, though, we don’t want to constrain the region information.  SeeNoSQL supports a “MIN_STRING” and “MAX_STRING” constant that, when evaluated and sent to Couch, corresponds to null and {} (empty object).  Submitting a range like this will return any string value in the key.

(For more information on the CouchDB View API, check out the CouchDB wiki and the “View Cookbook for SQL Jockeys”.)

If we click the “Query Designer” button and execute the query, we’ll get results similar to:

Query Designer

Notice that CouchRS has broken the array key into separate fields.  (In future versions of CouchRS, I plan to have it name the fields for you based on how you aliased them in the command text.  If you use an object key, it already names the fields using the field names on the object).  Although I could work with fields named “key0”, “key1”, etc., I’ve chosen to rename mine to make things more intuitive while building the report:

Field Properties

In the sample reports project (included in the GitHub repository), the above query drives the data for the “Territory Group Sales by Year” report:

Territory Group Sales by Year

And page 2 of the sample report:

Territory Group Sales by Year 2

Ok – so it’s very easy to get charted reports up and running against CouchDB.  One key thing to bear in mind is that since the results from Couch are returned to CouchRS as JSON, all the values are treated as text.  This means you will most likely have to convert numeric & date values to their respective types as you use them (I had to use CDec() and CInt() in these reports in order to properly format currency and to properly sort by month).

However – what if you wanted to compare a given region against the average of the other regions? Or in the case of a customer, see where they fall compared to the average customer?  Stay tuned….we’ll cover that in the next post….

 

CouchRS – Building Your First Report against CouchDB using SSRS

In my introductory post for CouchRS, I explained where to get the code and how to install the custom processing extension.  Now that you have CouchRS installed, let’s look at a very simple example on how to use it.

In Visual Studio 2008, our first step is to create a new Report Server project and add a Shared Data Source to the project, pointing it to our CouchDB server.  The changes we made to our RSReportDesigner.config and RSPreviewPolicy.config (explained in my earlier post) enable the CouchRS processing extension to be displayed as one of the possible data sources when creating a new data source.  Below is a screen shot of creating this new shared data source:

Shared Data Source

Note that all we have to do is name it, select “CouchDB Connection” from the drop down, and for the connection string we simply provide the server, port and database name.  CouchRS *does* support credentials, but it does not support Integrated Security or Impersonation.  To include credentials with your CouchDB connection, click on the “Credentials” tab, enter a username and password and click OK.

With the shared data source now in place, we can create a new (blank) report.  In order for this report to utilize the project-level shared data source, we need to add a report data source as follows (note that you simply select the available shared data source next to the “Use shared data source reference” option):

Add Report Data Source

Now we’re ready to create our first data set, but first we need to discuss “SeeNoSQL”.  “SeeNoSQL” is a very lightweight custom query language that Josh Bush and I created in order to accomplish the following:

  • Provide SSRS developers with a readable, “almost-SQL-like-but-not-too-much” way to express queries.
  • Enable complex parameter-driven constraints, including precedence and “AND/OR” chaining within those constraints in order to abstract the need for multiple HTTP requests away from the developer.
  • Provide a concise way to enable users to set more advanced query options, while abstracting the unnecessary elements of it away from the developer.
  • If possible, use the same query syntax against multiple NoSQL storage engines to provide consistency (time will tell on this one as I plan to release custom processing extensions for other NoSQL DBs).

If you’re even moderately familiar with CouchDB, you know that all interaction with CouchDB is done via HTTP.  Josh and I felt that it would be more intuitive for a developer to write in a declarative style rather than wiring up the raw HTTP requests in the data set’s command text.  So, let’s start with the simplest possible query in “SeeNoSQL”.

DataSet 1

In the above screen shot, we’ve named our data set, selected the report data source we created earlier and we’ve entered this command:

FROM DOCUMENT(fff2aaa6-236c-44f2-b2bf-e8a37360768f‘)

Hopefully, you can infer what’s going on here.  The results being returned will be “FROM” a single “DOCUMENT” with the ID of “fff2aaa6-236c-44f2-b2bf-e8a37360768f”.  It really doesn’t get much simpler than this!  However, this report is not going to be terribly useful if we have the Couch document ID hard coded in our command text.  SeeNoSQL supports parameters, so we can parameterize like this:

Parameterized Data Set

(In future posts, we’ll discuss how to give the user a more meaningful parameter than one that takes an ugly GUID as input!)

So – what happens if we click on “Query Designer” and actually run this query?  Here’s the result:

QueryDesigner

“WAIT A SECOND!”, you say, “I thought we queried for one document!”

Yes, we did.  In my earlier post I explained that since we’re crawling an object graph, it must be ‘flattened’ into a relational structure (since Reporting Services is using a DataReader under the hood).  And by ‘flattened’ I mean that we have to find the Cartesian product of each document’s members.  This means if the document has no child collection(s), it’s effectively already flattened.  However, this document has a collection of “SalesOrderHeader” hanging off the root, and each of those items has a collection of “SalesOrderDetails”.  Our example customer has two orders, the first with one line item, and the other with three line items.  The first order and it’s one item can be flattened into one row alongside the Customer information.  The second order has to be flattened into three rows – one for each order detail, alongside the order and customer information – hence the four rows we get back.  The closest analogy to this would be a SQL JOIN causing the results to be “blown out” based on the “many” side of a “one-to-many” relationship.  The implications of this are obvious: To neatly display this data in a report, you need to be mindful of how to properly group it in a Table or List region to avoid unnecessary duplication.  “So”, you ask, “Is there a way to query for only the data I want, rather than getting whole documents back and having to group the records in order to eliminate duplication?”  YES!  You can use a view in CouchDB to customize what is returned so that you only have to deal with the data you want.  I’ll be covering CouchDB views in my next post (this is where “SeeNoSQL” shines, as well).

Based on the data returned for this document, I created a simple “Customer Summary” report which displays all of the orders (with detail lines) for a given customer:

Customer Summary Sample Report

 

It’s worth noting that the images are being dynamically rendered.  In the SQL version of AdventureWorks, these images are stored in varbinary fields, so I included them in my upload to CouchDB.  With the image binary data serialized to JSON, it’s as easy as telling the image control which field to point at for the data, and selecting the MIME type for the image.

This is a very simple example, but I hope it helps you appreciate the power of leveraging SSRS against CouchDB.  You can develop the same type of reports using CouchDB as a data source as you can against SQL Server while using a storage medium that’s more friendly to the actual structure of your domain model.

We’ve barely scratched the surface….coming up we’ll talk about using map/reduce in CouchDB views, and SeeNoSQL to produce complex requests against CouchDB.

[I plan to include the sample reports under the directory structure of my Git repository, but you will not see them in the VS2010 solution, since VS2008 must be used to create these reports as of the time of this post.  Look under the “src'” folder to find this sample project.]

 

Introducing CouchRS – an SSRS Processing Extension for CouchDB

Unless you’ve been holed up on a desert island (or perhaps trapped in a legacy .NET 1.0 shop), odds are you have heard of the “NoSQL” movement by now.  Having a background that’s been nearly equally spread across application and database development, I was intrigued when my good friend (and talented developer) Alex Robson first explained CouchDB to me.  I had what are probably the “initial typical” reactions: “Wow, no ORM?  Is it fast?  What about transactions?  Wait, there’s NO schema?  Wow, NO ORM?!”  It was about that time that my data warehousing background piped up and said “Hey, I wonder what kind of analytics we could get out of it?”  As we began to use CouchDB more and more in our applications, we considered approaches that would allow us to leverage our existing BI infrastructure along side of it.  One option – viable, though loaded with more work – was to set up a message bus with listeners that subscribed to commands coming from the UI (or perhaps a ‘trigger’ service monitoring CouchDB for changes).  Those listeners would be responsible for populating a relational (or dimensional) database.  While this is technically viable, it does introduce more potential points for failure, and an overall increase in complexity.  The real world result, then, was we developed our applications against CouchDB, and delayed integrating our BI tools (SSRS, in this case) as long as possible, since we – like most other shops – were pointing SSRS at SQL data stores.

Enter CouchRS.  Several months ago I mentioned to Alex that I wanted to create a Custom Processing Extension that could allow our reports to simply talk to CouchDB directly.  Easy enough, right?  Well, not so fast.  Remember, CouchDB is a document-oriented database.  It will store anything you throw at it – in the same database.  Want to persist an object graph serialized to JSON?  Done.  Want to save the binary stream of a pdf?  Done.  Feel like storing a recipe list, complete with picture attachments?  Done and done.  With such a wide variance in the type & content of the documents in a Couch database, some general (and powerful) assumptions had to be made.

First – the documents are expected to be JSON.  No big leap there, since that’s standard with Couch.

Second – all the documents in a result set would have to be traversed, and a normalized set of “fields” generated in order to ‘present’ those documents as a relational data set.  Bear in mind that this includes flattening any document that represents an object hierarchy (i.e. – a customer, that has orders, that have order lines, etc.) by producing a Cartesian product of each object graph.  (Alex and Josh Bush worked serious magic in this area.  This project would not have been possible without their collaboration.)

Once we were returning results from Couch, Josh and I collaborated on creating a lightweight query syntax (“CouchQuery”) using “Irony” – a fantastic library created by Roman Ivantsov.  It’s SQL-esque enough to be intuitive and readable, without forcing “Couch” into a “SQL” bottle.  I’ll explain more about how to use this syntax in future posts, but suffice it to say that there are exciting implications in that it supports a “UNION” operator, and parenthetical logic (AND, OR, etc.) in the “WHERE” clause, meaning multiple requests can be made against a CouchDB in ONE query, and the results will be returned to the report as if they were one request.  This gets incredibly cool when you start “UNION”-ing the results of different views.

In upcoming posts, I will elaborate more on how to use the query syntax, discuss the necessity of being familiar with Couch views (and map/reduce in general) and present some sample reports.

How do I get it?

The source is located on GitHub and is available under the Apache 2.0 license (i.e. – it’s yours to download, modify, use in open & closed source applications…so have at it!)

How do I install it?

Alas, until I have time to write an installer you must endure this lovely process (it looks worse than what it is, don’t worry):

  • Download the source, build and grab the CouchRS.dll from the bin/Debug or bin/Release
  • For use with Visual Studio 2008 SSRS Designer:
    • Find your Visual Studio installation folder (usually C:\Program Files\Microsoft Visual Studio 9.0\ or C:\Program Files (x86)\Microsoft Visual Studio 9.0\ on x64 machines).  From there, browse to \Common7\IDE\PrivateAssemblies and paste the CouchRS.dll (along with its dependencies: Irony.dll, Newtonsoft.Json.dll, Symbiote.Core.dll & StructureMap.dll) file into that folder.
    • In the PrivateAssemblies folder, you need to edit two .config files: RSPreviewPolicy.config and RSReportDesigner.config.
      • RSPreviewPolicy.config:
        • Find the “CodeGroup” element that looks similar to:

          <
          CodeGroup class="FirstMatchCodeGroup" version="1" PermissionSetName="Execution" Description="This code group grants MyComputer code Execution permission. ">

        • Find the end of the children of this CodeGroup and paste the following element (be aware that the Url path needs to match where you pasted the CouchRS.dll:

          <CodeGroup class="UnionCodeGroup" version="1" PermissionSetName="FullTrust" Name="CouchDBCodeGroup" Description="Code group for CouchDB data processing extension"> <IMembershipCondition class="UrlMembershipCondition" version="1" Url="C:\Program Files\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies\
          CouchRS.dll"/> </CodeGroup>

      • RSReportDesigner.config:
        • Find the “Data” element and add a new child element that looks like this:

          <
          Extension Name="CouchDB" Type="CouchRS.DataProcessingExtension.CouchDbConnection
          ,CouchRS"
          />

        • Next, find the “Designer” element and add a new child element that looks like this:

          <
          Extension Name
          ="CouchDB"
          Type="Microsoft.ReportingServices.QueryDesigners.GenericQueryDesigner,Microsoft.
          ReportingServices.QueryDesigners"
          />
  • For use with a SQL Server 2008 Reporting Services Installation:
    • Find your Report Server’s bin directory (located {SQL Install Directory}\{ReportServer Instance}\Reporting Services\ReportServer\bin – for example: C:\Program Files\Microsoft SQL Server\MSRS10.MSSQLSERVER\Reporting Services\ReportServer\bin) and paste the CouchRS.dll file there (along with its dependencies: Irony.dll, Newtonsoft.Json.dll, Symbiote.Core.dll & StructureMap.dll).
    • in the “ReportServer” directory (one level up from the ‘”bin”), edit the following files:
      • rsreportserver.config
        • Find the Data element and add the following child element:

          <
          Extension Name="CouchDB" Type="CouchRS.DataProcessingExtension.CouchDbConnection
          ,CouchRS"
          />
      • rssrvpolicy.config
        • Find the CodeGroup element that looks similar to this:

          <
          CodeGroup class="FirstMatchCodeGroup" version="1" PermissionSetName="Execution" Description="This code group grants MyComputer code Execution permission. ">
        • Add a new child element that looks like this:

          <
          CodeGroup class="UnionCodeGroup" version="1" PermissionSetName="FullTrust" Name="CouchDBCodeGroup" Description="Code group for CouchDB data processing extension"> <IMembershipCondition class="UrlMembershipCondition" version="1" Url="C:\Program Files\Microsoft SQL Server\MSRS10.MSSQLSERVER\
          Reporting Services\ReportServer\bin\CouchRS.dll"
          /> <IPermission class="SecurityPermission" version="1" Flags="Execution"/> </CodeGroup>
    • You should recycle the SSRS Service at this point.

There is more to come!  If your interest is piqued, please leave a comment, Retweet this post and watch our repository on GitHub…