One Month with Sitecore 7.5, Part 6: Extending Report Data via Aggregation

Thu Aug 28 2014

In the final part of this series investigating Sitecore 7.5, we’ll look at how the new analytics and reporting structure allows us to extend the processing framework, and create new data in the reporting SQL database.

By moving collected analytics data to MongoDB, Sitecore solved issues of scalability and extensibility. However it did not help them with the problem of doing reporting on these massive data sets. While MongoDB is a great platform for storing and retrieving documents, relational databases still rule the world of complex queries and data analysis. So rather than eliminate the SQL database from analytics, Sitecore introduced a processing framework that can aggregate data into a new relational data structure which has been optimized for reporting.

The new reporting database contains a series of fact and dimension tables, which is a common structure utilized by business intelligence and data warehousing tools. In short, a fact is an event, potentially with some sort of measurable data about the event. A fact for example might be a page view (including duration) or a website visit (including the number of pages visited). Facts are structured in a way that should allow easy summation, grouping, etc for reporting purposes. A fact record would contain foreign keys to dimension tables, which would contain data about the people or objects which were involved with the event. This could be the Sitecore item which was visited in a page view, or the contact which visited your site. In essence, dimensions are lookup tables.

This new Sitecore 7.5 analytics framework also allows you to extend the reporting database with your own fact and dimension tables, and to extend data processing to populate them. You may perhaps want to do some reporting on data you have added to the contact, or on data you are collecting about user interactions via page events.

In the preview release of Sitecore 7.5 provided to MVPs, the process for creating a custom aggregation is described in the Customization chapter of the xDB Configuration Guide.

Utilize events or other analytics to log the data you wish to aggregate.
Create a new Fact table.
Create model classes for the key and value of your Fact.
Create a new AggregationProcessor and register in the aggregation pipeline.

In this example, we are going to create a new fact table with data about what products are added to our users’ shopping carts. Note that the Sitecore documentation is much more thorough in describing this process — be sure to reference it. Consider this your introduction/overview.

Use Page Events to Collect Cart Data

For this POC, I just added the event to the existing Active Commerce shopping cart logic. It’s obviously important here to include any data which you wish to include in your aggregation. You’ll also need to create the event in Sitecore.

Create a new Fact Table

The irony of Sitecore introducing a NoSQL database to its architecture in 7.5 is that for the first time, Sitecore is also giving you a reason to create new relational tables in SQL Server. Well, at least it’s ironic to me.

My new fact table contains information on all the products which users have added to their carts. You are typically going to have an aggregate primary key which contains the columns that define the uniqueness of the event. For our “product added” fact, that will be the product code (unique product identifier), the date of the event, the site the user was browsing, and the contact who added the product to his/her cart. The only aggregated value we are tracking on this fact is the quantity added.

We’ll also add foreign key constraints to the appropriate dimension tables. Note that Sitecore recommends that you create these constraints to document dependencies with dimension tables, but that you disable them to improve performance.

Create Model Classes for your Fact

Our next step is to create model classes for our new fact table, which Sitecore will map for us during the aggregation process. We’ll need a DictionaryKey subclass for our “key,” which contains our primary key fields, and a DictionaryValue subclass for our “value,” which contains the aggregated value(s) for the event. We’ll also create a Fact subclass which combines the two.
Sitecore seems to do the table and field mapping based on naming, and also seems to handle the obvious type mappings between Guid/uniqueidentifier, DateTime/shortdatetime, string/varchar, long/bigint, etc. The current early-release documentation is incomplete on this subject. The use of the Hash32 type for our site dimension ID, for example, was based on reviewing existing facts and aggregation processors which Sitecore includes in 7.5.

The constructor for the Fact base class accepts a reduction function which we must provide. This function combines, or aggregates, two values for a given fact key. If we were to process two events which have the same key, this function would be called to aggregate their values before the fact is written to the reporting database. In this example, we simply add the values together, as I suspect would often be the case. Your DictionaryValue subclass is a logical place to create the static function that’s needed here.

Create a New AggregationProcessor

Here’s where the real work happens. As you might have expected, aggregation processing happens in a pipeline. When a visit is being processed, it is passed through the interactions pipeline and each processor has the opportunity to perform aggregation for the facts for which it is responsible. The processor itself can examine data in the visit, and “emit” facts.

What’s interesting here is that you could theoretically call out to other data sources here in constructing your facts — you aren’t limited to data being processed from the xDB. I’m also curious as to whether the processing API would allow distribution of processing work for other data sources beyond visits, perhaps calling a custom pipeline. But that’s an investigation for another time.

For our processor here, we need to iterate over the pages in the visit, and look for any shopping cart events. If any are found, we’ll use the Fact API to construct a new fact, and “emit” it with its key and value. Behind the scenes, this will call our aggregation function as needed. The processing API also provides some other utility calls we need, to find or create the site dimension as needed, and to translate the date/time precision of our event as needed. The default precision strategy will “round” the date/time to the minute. This would, in theory, allow you to run and filter reports with minute-by-minute precision.

Finally, we’ll need to patch in this new processor to our Sitecore config. Note that there appears to be some new grouping available in the pipeline configuration now. As the number of pipelines in Sitecore continues to balloon, this totally makes sense. Perhaps Sitecore will shed more light on this new structure as 7.5 comes closer to release.

Rebuild and Test

To test our new processor, we need to rebuild our analytics data. To facilitate rebuilding of analytics data, Sitecore actually requires that you have two reporting databases, so that one can still be available for reporting, while the other is rebuilding. These are simply configured as the reporting and reporting.secondary connection strings. Testing of the rebuild can then be done through a new administrative screen, /sitecore/admin/RebuildReportingDB.aspx.

Click “Start” and Sitecore will begin to process, and update you on progress as it goes.

If you have a lot of data, rebuilding could obviously take some time. On large sites which have collected a lot of data, it may be necessary to keep a reduced data set around for testing purposes. Otherwise the debugging cycles for new aggregations could become very long and arduous.

Once processing is completed, aggregated data should appear in your fact table.

Reporting

Now that we have this additional data available, how do we best report on it? One option I imagine would be creating some cool new SPEAK-based reporting UIs. I am not experienced enough with the framework yet myself to say, but it seems like it would be easy enough to wire up some SPEAK charting components along with a SQL-based data source to create your own reports. But that will be a post for another day, perhaps by someone else!

I did want to attempt to push my data into a Stimulsoft report (Engagement Analytics) as well, which seems like it would be easier. But at the moment I’m getting an error when attempting to access report items in the Content Editor. And thus I am bailed out by beta software. But the point is — you have some options for creating reports based on your new data.

That’s it!

And that brings us to the end of our series on Sitecore 7.5. This release of Sitecore truly brings the infrastructure and architecture of DMS to the next level. As always, it will be exciting to see what partners and customers do with the framework. We at Active Commerce are very much looking forward to using the framework to bring new functionality and great new data to our customers.