• Remove a Page from the Sitecore xDB (Sitecore 8 Technical Preview)

    Posted 10/01/2014 by techphoria414

    Note: Information in this post is based on the Sitecore 7.5 and 8 Technical Previews and is subject to change.

    Here's a quick one. I've been doing some JMeter traffic generation on my xDB for a forthcoming post/video on the Path Analyzer, and to push data from session to the xDB quickly, I implemented an "end session" page to hit at the end of each test thread.

    (I will hopefully also get a chance to share my work in JMeter, but in the mean time you should start where I started, with Martina Welander's awesome post.)

    Unfortunately the first time around I forgot to exclude my /EndSession.aspx from analytics, so it was muddying my data. However it was pretty easy to directly remove this from the xDB using a mongodb query.

    Basically, I'm telling mongodb to update documents in the the Interactions collection and remove elements from the Pages array where the URL path is /EndSession.aspx. The final "true" argument tells mongodb to update all documents which match the query (the empty first argument in this case), not just the first it finds. For more info on what's going on here, check the mongodb documentation on the update() method.

    After running this, I had to rebuild the sitecore_analytics_index using the Indexing Manager and rebuilt the reporting database using /sitecore/admin/RebuildReportingDB.aspx.

    This query was used on a Sitecore 8 xDB but based on the Sitecore 7.5 xDB Technical Preview, would work with 7.5 as well.

    - Nick / techphoria414 

    Read more... Pre-Disqus Comments (1)
  • One Month with Sitecore 7.5, Part 6: Extending Report Data via Aggregation

    Posted 08/28/2014 by techphoria414

    In the final part of this series investigating Sitecore 7.5, we’ll look at how the new analytics and reporting structure allows us to extend the processing framework, and create new data in the reporting SQL database.

    By moving collected analytics data to MongoDB, Sitecore solved issues of scalability and extensibility. However it did not help them with the problem of doing reporting on these massive data sets. While MongoDB is a great platform for storing and retrieving documents, relational databases still rule the world of complex queries and data analysis. So rather than eliminate the SQL database from analytics, Sitecore introduced a processing framework that can aggregate data into a new relational data structure which has been optimized for reporting.

    The new reporting database contains a series of fact and dimension tables, which is a common structure utilized by business intelligence and data warehousing tools. In short, a fact is an event, potentially with some sort of measurable data about the event. A fact for example might be a page view (including duration) or a website visit (including the number of pages visited). Facts are structured in a way that should allow easy summation, grouping, etc for reporting purposes. A fact record would contain foreign keys to dimension tables, which would contain data about the people or objects which were involved with the event. This could be the Sitecore item which was visited in a page view, or the contact which visited your site. In essence, dimensions are lookup tables.

    This new Sitecore 7.5 analytics framework also allows you to extend the reporting database with your own fact and dimension tables, and to extend data processing to populate them. You may perhaps want to do some reporting on data you have added to the contact, or on data you are collecting about user interactions via page events.

    In the preview release of Sitecore 7.5 provided to MVPs, the process for creating a custom aggregation is described in the Customization chapter of the xDB Configuration Guide.

    1. Utilize events or other analytics to log the data you wish to aggregate.
    2. Create a new Fact table.
    3. Create model classes for the key and value of your Fact.
    4. Create a new AggregationProcessor and register in the aggregation pipeline.
    In this example, we are going to create a new fact table with data about what products are added to our users’ shopping carts. Note that the Sitecore documentation is much more thorough in describing this process -- be sure to reference it. Consider this your introduction/overview.

    Use Page Events to Collect Cart Data

    For this POC, I just added the event to the existing Active Commerce shopping cart logic. It’s obviously important here to include any data which you wish to include in your aggregation. You’ll also need to create the event in Sitecore.

    Create a new Fact Table

    The irony of Sitecore introducing a NoSQL database to its architecture in 7.5 is that for the first time, Sitecore is also giving you a reason to create new relational tables in SQL Server. Well, at least it’s ironic to me.

    My new fact table contains information on all the products which users have added to their carts. You are typically going to have an aggregate primary key which contains the columns that define the uniqueness of the event. For our “product added” fact, that will be the product code (unique product identifier), the date of the event, the site the user was browsing, and the contact who added the product to his/her cart. The only aggregated value we are tracking on this fact is the quantity added.

    We’ll also add foreign key constraints to the appropriate dimension tables. Note that Sitecore recommends that you create these constraints to document dependencies with dimension tables, but that you disable them to improve performance.

    Create Model Classes for your Fact

    Our next step is to create model classes for our new fact table, which Sitecore will map for us during the aggregation process. We’ll need a DictionaryKey subclass for our “key,” which contains our primary key fields, and a DictionaryValue subclass for our “value,” which contains the aggregated value(s) for the event. We’ll also create a Fact subclass which combines the two.
    Sitecore seems to do the table and field mapping based on naming, and also seems to handle the obvious type mappings between Guid/uniqueidentifier, DateTime/shortdatetime, string/varchar, long/bigint, etc. The current early-release documentation is incomplete on this subject. The use of the Hash32 type for our site dimension ID, for example, was based on reviewing existing facts and aggregation processors which Sitecore includes in 7.5.

    The constructor for the Fact base class accepts a reduction function which we must provide. This function combines, or aggregates, two values for a given fact key. If we were to process two events which have the same key, this function would be called to aggregate their values before the fact is written to the reporting database. In this example, we simply add the values together, as I suspect would often be the case. Your DictionaryValue subclass is a logical place to create the static function that’s needed here.

    Create a New AggregationProcessor

    Here’s where the real work happens. As you might have expected, aggregation processing happens in a pipeline. When a visit is being processed, it is passed through the interactions pipeline and each processor has the opportunity to perform aggregation for the facts for which it is responsible. The processor itself can examine data in the visit, and “emit” facts.

    What’s interesting here is that you could theoretically call out to other data sources here in constructing your facts -- you aren’t limited to data being processed from the xDB. I’m also curious as to whether the processing API would allow distribution of processing work for other data sources beyond visits, perhaps calling a custom pipeline. But that’s an investigation for another time.

    For our processor here, we need to iterate over the pages in the visit, and look for any shopping cart events. If any are found, we’ll use the Fact API to construct a new fact, and “emit” it with its key and value. Behind the scenes, this will call our aggregation function as needed. The processing API also provides some other utility calls we need, to find or create the site dimension as needed, and to translate the date/time precision of our event as needed. The default precision strategy will “round” the date/time to the minute. This would, in theory, allow you to run and filter reports with minute-by-minute precision.

    Finally, we’ll need to patch in this new processor to our Sitecore config. Note that there appears to be some new grouping available in the pipeline configuration now. As the number of pipelines in Sitecore continues to balloon, this totally makes sense. Perhaps Sitecore will shed more light on this new structure as 7.5 comes closer to release.

    Rebuild and Test

    To test our new processor, we need to rebuild our analytics data. To facilitate rebuilding of analytics data, Sitecore actually requires that you have two reporting databases, so that one can still be available for reporting, while the other is rebuilding. These are simply configured as the reporting and reporting.secondary connection strings. Testing of the rebuild can then be done through a new administrative screen, /sitecore/admin/RebuildReportingDB.aspx.

    Click “Start” and Sitecore will begin to process, and update you on progress as it goes.

    If you have a lot of data, rebuilding could obviously take some time. On large sites which have collected a lot of data, it may be necessary to keep a reduced data set around for testing purposes. Otherwise the debugging cycles for new aggregations could become very long and arduous.

    Once processing is completed, aggregated data should appear in your fact table.


    Now that we have this additional data available, how do we best report on it? One option I imagine would be creating some cool new SPEAK-based reporting UIs. I am not experienced enough with the framework yet myself to say, but it seems like it would be easy enough to wire up some SPEAK charting components along with a SQL-based data source to create your own reports. But that will be a post for another day, perhaps by someone else!

    I did want to attempt to push my data into a Stimulsoft report (Engagement Analytics) as well, which seems like it would be easier. But at the moment I’m getting an error when attempting to access report items in the Content Editor. And thus I am bailed out by beta software. But the point is -- you have some options for creating reports based on your new data.

    That’s it!

    And that brings us to the end of our series on Sitecore 7.5. This release of Sitecore truly brings the infrastructure and architecture of DMS to the next level. As always, it will be exciting to see what partners and customers do with the framework. We at Active Commerce are very much looking forward to using the framework to bring new functionality and great new data to our customers.

  • One Month with Sitecore 7.5, Part 5: Persisting Contact Data to xDB

    Posted 08/28/2014 by techphoria414

    With its flexible schema and scalable architecture, the xDB immediately becomes an attractive option in Sitecore 7.5 for storing all sorts of user-centric data, particularly anything you are interested in utilizing for reporting purposes. Developers who have worked with the .NET MongoDB Driver know how easy it is to persist any object data to the database. However, for good reason, your access to xDB is a bit more abstracted than this. You do, however, have three options for persisting contact data to the xDB.

    I myself only implemented one of the options below in my search for a means to persist shopping cart data in a POC for Active Commerce. But I’ve provided an overview of all three options.

    The submitContact Pipeline

    We saw in Part 3 of this series how data can be associated with the current contact via the Contact.Attachments dictionary.

    Though very useful, data in the Attachments collection is not persisted with the contact when the session is flushed. However, you could tap into the submitContact pipeline by creating your own SubmitContactProcessor, and persist the data to your own collection in MongoDB.

    As for how you persist, and how you load that data later, you’re a bit on your own. There is no corresponding loadContact pipeline at this time, and your best option for persistence appears to be accessing MongoDB directly via Sitecore.Analytics.Data.DataAccess.MongoDb.MongoDbDriver. You could then potentially access that data via an extension method on your Contact. Not ideal, and I’m not sure whether this would work with xDB Cloud.

    This did not seem to be the ideal option for me. I wanted something more straightforward which worked within the existing xDB data structures.


    This structure on the contact seems to allow storing of simple name/value string pairs that are persisted and loaded with the contact data. This is a step forward, but for a shopping cart, I needed something that could handle a complex object.

    Contact Facets

    Not to be confused with search facets, contact facets allow you to define entirely new model classes that can be stored with the contact, and accessed via Contact.GetFacet<T>(string). Here we have an option which allows us to store complex data with the contact, without having to worry about persisting the data ourselves. Sitecore 7.5 includes a number of contact facets, which can be utilized to store additional information about the contact. This data appears to help fill out the Experience Profile report.

    The facets are configured in a new /sitecore/model configuration element, which defines various data model interfaces and their implementations, and associates them to entities (a contact in this case) with a given name.

    For example, to fill in a contact’s first/last name, we can use the Personal facet.

    Implementing your own facet requires a few steps, but is not difficult. The steps below include my POC for persisting shopping cart data.

    1. Create an interface for your facet which inherits IFacet. Add your desired fields.
    2. Create an implementation which inherits Facet. Use base methods to “ensure,” “get,” and “set” member values.
    3. For composite object structures, create an IElement and Element following the same pattern.

    4. Register your element in the /sitecore/model/elements configuration.
    5. Register the facet in the /sitecore/model/entities/contact/facets configuration.

    6. Access the facet via Contact.GetFacet<T>(string).

    After the contact’s session is flushed, you can very plainly see your new data persisted with the Contact. Nice!!

    MongoDB Facet Data

    As you can see, facets are an easy and powerful means of persisting contact data to xDB.

    That’s it for Part 5! In the last part of this series, we’ll look at another new extension point available in Sitecore 7.5, data aggregation.

  • One Month with Sitecore 7.5 Part 4: Sclability Options, New and Old

    Posted 07/09/2014 by techphoria414

    It’s actually going on two months with Sitecore 7.5 at this point, and I’ve obviously gotten myself in over my head with this blog series, but I’m battling on. Here in Part 4, we’ll take a look at deployment options for Sitecore 7.5, both new and existing.

    A more in-depth reviewing of existing Sitecore hosting architecture considerations can be found on Aware Web’s blog.

    Minimal Deployment

    Despite the additional complexity Sitecore has added to the DMS architecture in 7.5, it's still possible to run full Sitecore functionality in a single instance of the software. This obviously simplifies things for developers, and means that even small deployments can take advantage of the xDB. However, unless you are utilizing the xDB Cloud Service, running MongoDB is necessary. With the addition of xDB Cloud and Sitecore's "enhanced" SQL Sessions (to support Session_End), you can avoid that requirement.

    Scaling via Server Role

    In 7.5, Sitecore continues to provide new options for offloading server roles onto dedicated hardware (or virtual hardware). This does allow you to vertically scale (in the traditional use of the term) individual servers according to the needs of the service they are running. Let's review all of our available options....

    Content Management and Content Delivery

    The longstanding scaling option in Sitecore, splitting your Content Management (CM) and Content Delivery (CD) servers is typically your first step in growing a Sitecore environment. This is sometimes called "content staging" and can be done for performance, availability, and security reasons.


    This has essentially been an option since Sitecore 6.3 (someone please correct me if I'm wrong on that), but since the publishing process was limited to a single thread, there was not much benefit to splitting publishing responsibility from your CM server. Sitecore 7.2 changed this however, and multi-threaded publishing can now easily consume CPU resources on a multi-core server. Large instances with many items and frequent content changes will benefit greatly from a dedicated publishing server.

    Content Search

    Using either the Solr or Coveo providers for Sitecore.ContentSearch, it’s possible to offload content indexing and searching onto a dedicated search server. Useful for deployments which are highly dependent on content search, or which need to handle searching of massive amounts of content.

    SQL Server

    Even the most minimal of Sitecore installations should likely have an independent SQL Server. In addition to the standard Sitecore databases (core, master, web), in 7.5 this would house the reporting database and potentially a session database. Depending on the amount of analytics data, you may want to take this a step further by splitting your reporting into a separate SQL Server. This is a must if your hosting architecture is geo-distributed, since you will need a central reporting database in which to aggregate data from your data centers.



    Unless you are using xDB Cloud (which we’ll discuss shortly) and using SQL Server for your sessions, to use analytics in 7.5 you will need to run MongoDB. For low traffic sites, it may be possible to run it on the same hardware as SQL Server, but given the low cost of virtualized Linux servers, adding a dedicated MongoDB install is an easy scaling option. You might also consider separating the MongoDB session database onto its own server, again a must if you have a geo-distributed architecture.

    Processing / Data Aggregation

    For high traffic sites, splitting off processing and data aggregation responsibilities to separate hardware will decrease impact on content authoring, and speed up data processing. Another option might be to dual-purpose your CD servers, and take advantage of distributed processing using your existing cluster.

    Reporting Services

    High traffic sites, or those making heavy use of reporting, may also benefit from splitting responsibility for the Reporting Services, which run queries and combine data from the collection and reporting databases in order to support reporting UIs such as the Executive Insight Dashboard and Engagement Analytics.

    Scaling by Server Role

    Scaling via Clustering

    Many aspects of the Sitecore deployment architecture can also “scale out,” so that as your needs increase, you can add additional servers for load balancing and failover.

    Content Delivery

    Almost always your first need for scaling out. Adding additional content delivery servers is the primary mechanism by which you can improve your site’s performance and availability. With the publishing improvements in 7.2, and the new analytics architecture in 7.5, it’s now also much more practical to deploy a geo-distributed architecture, with multiple clusters of content delivery servers. By utilizing the SQL Server or MongoDB session state providers in Sitecore 7.5, it’s also possible to implement non-sticky load balancing within each webfarm, which gives a better load distribution between the servers, and increases the reliability / failover capability of your webfarm.

    Content Management

    For organizations with a large number of content authors, adding additional content management servers allows scaling of the authoring environment. Since you can only have a single master database, all content management servers must be local to the master database.


    One of the primary benefits of MongoDB over SQL Server and other relational databases is its ability to scale horizontally. Sharding allows MongoDB to distribute data across multiple servers using a shard key. Replication sets mirror data between servers for failover. Combining the two gives you a sharded cluster. This allows MongoDB to scale horizontally for huge data sets, on low cost Linux servers. This is not without complexity however. See the MongoDB guide to Sharded Cluster Architectures.

    Processing Servers

    The new Sitecore 7.5 data processing/aggregation services utilize worker processes which read from a processing pool and feed them to “aggregators” which process the data for reporting purposes.. This makes it possible for deployments which are processing huge amounts of data to run multiple processing servers, all of which can process data from the Collection database and aggregate it to the Reporting database.

    Between versions 7 (content search), 7.2 (publishing), and 7.5 (analytics), Sitecore has made many improvements on the ability of the software to scale to massive amounts of content and analytics data. But with it has come additional deployment complexity. To mitigate this, and make it possible for any Sitecore deployment to easily take advantage of the xDB, Sitecore plans to offer xDB Cloud.

    Scaling by Cluster

    xDB Cloud Service

    Complimenting the release of Sitecore 7.5, xDB Cloud will allow you to take full advantage of the Experience Database and all its reporting without having to collect, store, or process any analytics data locally. 100% managed by Sitecore, this means you can get away without having to run MongoDB in your environment -- provided you are using SQL Server for session data.

    Sitecore plans to offer xDB Cloud at a “low” cost based on the number of contacts stored in the xDB. They will also be offering non-production access at a lower cost (with a corresponding lower SLA) for use in development and testing.

    What’s most exciting about this new offering is that Sitecore is removing barriers for all their customers to better utilize the Digital Marketing System. Large customers with huge amounts of data can take advantage of the new, highly scalable architecture. Smaller and larger customers alike also have the option of outsourcing their analytics infrastructure to Sitecore with a service that they know will grow with their data needs.

    Configuration and Use

    Sitecore was kind enough to grant me access to a preview of xDB Cloud for evaluation with Active Commerce. After requesting an instance from Sitecore, a step which will be replaced with an easy App Center purchase in the future, I was given a Deployment ID. I enabled the Sitecore.Cloud.Xdb.config, filled in my Deployment ID within this file, and… that was it. I admittedly did not spent a lot of time testing this service, but it certainly appears that they have achieved their goal of making xDB easy to use via this SaaS offering.


    One of the most exciting aspects of xDB, which I will be investigating in the last part of this series, is the extension of reporting data. The data aggregation and reporting architecture in 7.5 makes it possible to extend the data collection and analysis performed by the DMS. However, in the initial release, it will not be possible to extend data aggregation in the xDB Cloud offering. Sitecore does plan on addressing this in a future release, and all appearances are that it is a high priority item for them.

    MongoDB Cloud Hosting

    This article was going to end here, but I was inspired to try one last experiment with Sitecore 7.5 this morning -- utilizing ObjectRocket’s hosted MongoDB service for the collection database and the other Sitecore 7.5 MongoDB databases. ObjectRocket, owned by Rackspace, has a very nice, scalable cloud offering for MongoDB. They have several data centers which are local to both AWS and Rackspace data centers, making it a very attractive option particularly for those who are already hosting with Amazon or Rackspace.

    I created all the needed MongoDB databases in an ObjectRocket instance and configured my connection strings to utilize them. This included a MongoDB session database. In a production scenario, this may or may not be practical, depending on your latency to ObjectRocket, as the session database would presumably be more sensitive to latency. But with just myself as a single user browsing my development site, there did not appear to be any performance degradation. Even the Experience Profile report, utilizes the collection database, seemed to perform reasonably.

    Using a service such as ObjectRocket with Sitecore 7.5 would potentially give you the ability to scale to massive amounts of data, without maintaining your own MongoDB infrastructure. However unlike Sitecore’s xDB Cloud offering, you would still have the complexity of maintaining your own processing and reporting services. And you would certainly want to do additional performance testing of both the site and data aggregation!

    That’s it for Sitecore 7.5 deployment architectures. In Part 5 of this series, we’ll get back into some code, and look at how you can extend the Contact data which is persisted to the xDB.


  • xFile

    One Month with Sitecore 7.5 Part 3: xDB Setup, Contact Identification and Shared Sessions

    Posted 06/05/2014 by techphoria414

    Sitecore has provided an early release of Sitecore 7.5 to MVPs, and I’ve fortunately had a little time to play with it and do some preliminary Active Commerce integration. This blog series reports what I’ve learned about Sitecore 7.5 and how you might utilize or implement it. In this article, I’ll explore the setup process to start developing with xDB, and how you can utilize the new Contact and Shared Session functionality.

    Lots of Databases!

    Get ready for your ConnectionStrings.config to blow up. Fortunately this doesn’t add a lot to your setup/install process, because through the magic of MongoDB, the additional MongoDB databases get created automatically for you when they are first accessed.

    You do of course need to install MongoDB, but this is pretty straight forward. Don’t be scared! MongoDB sets up easily in Windows, and can even be setup as a Windows Service. David Morrison of Sitecore has created a Powershell script to do this for you. Sitecore MVP Benjamin Vangansewinkel has also written a great article on setting up MongoDB.

    Don’t Forget the Basics

    My initial attempts at getting analytics data flowing from Active Commerce were stymied by a couple rookie mistakes:

    1. Overwriting Sitecore’s Global.asax. Apparently using a Global.asax that inherits from Sitecore.Web.Application has been a requirement for some time now -- but not to the point that I’d noticed any issues with our Active Commerce demo site up to this point. Because analytics are written on Session_End now, this is a big requirement for Sitecore 7.5.
    2. Not having an sc:VisitorIdentification tag in my main layout. Still a must-have for Sitecore analytics!


    In addition, you’ll want to configure your session state provider. InProc sessions will work, but keep in mind that you can potentially lose data if there is an application pool recycle in the middle of a user session. Sitecore provides two new options -- a MongoDB session provider, and an “enhanced” SQL Server session provider. As the builtin ASP.NET SQL session state provider does not support Session_End, it is no longer an option for use with Sitecore. Keep in mind that all your session objects must now be [Serializable] to utilize these database session state providers.

    Here, I’ve configured MongoDB session state. Note that I’ve turned my session expiration way down to help with testing and development. Alistair Deneys also pointed out on the MVP forum that you can utilize Session.Abandon() to end the session and flush data to the xDB. I forsee a /sitecore/admin utility page or maybe even a browser button/plugin to make this easily accessible. It is definitely one of the bigger headaches of working with the new analytics data flow.

    As always, documentation is your friend. For complete xDB setup instructions, you will want to review the new xDB Configuration Guide which will ship with 7.5.

    Getting it Working -- Analytics API Changes

    As you might expect, there have been a number of changes to Sitecore.Analytics APIs. As this is an early release, these are still subject to change, but here are a few before/after examples from changes I had to make to Active Commerce in order to get a working build against Sitecore 7.5.

    As you can see, the primary access point for analytics APIs has changed from Sitecore.Analytics.Tracker to Sitecore.Analytics.Tracker.Current. This appears to allow injection of ITracker implementations, and also dynamic switching of the current ITracker. Not sure yet why you would switch trackers….

    Identifying the Contact and Shared Session

    The problem with this new “contact” concept is that whole HTTP protocol thing. It’s anonymous and stateless, you know. But we have ways of getting users to self-identify, whether it be a login, placing an order, or filling out a form with their email address. Once a user has done this, Sitecore 7.5 provides an API call to identify the user. Once you make this API call, Sitecore will merge the visit data with any existing data about the contact. For this proof of concept, I’m using the customer’s email address to identify them when they log in, or when they place an order in Active Commerce.

    This is powerful not just for analytics data, but also for utilizing the new Shared Session concept. The Contact.Attachments dictionary allows us to store arbitrary data with the contact. Since the contact is in shared session, this means that if an identified contact is accessing the site from multiple devices simultaneously, we can access that “session” on both devices. This is pretty powerful stuff. You can see below that in Firefox and Chrome, I’m sharing the same Active Commerce shopping cart using the 7.5 Shared Session.

    Shared Session xDB

    Contact Merging

    Part of the magic that happens when you Identify a contact is that Sitecore will merge data collected for the current “anonymous” contact, and the existing “known” contact, if it already exists. For additive data such as visited pages and triggered events, this isn’t a problem. But what if we have personal information collected for both contacts, or a shopping cart for both contacts? Sitecore provides some builtin logic for this in the mergeContacts pipeline, but I forsee this being a major area of testing and customization, depending on the particular implementation's business rules about what data should win out, and when. Should I load a saved shopping cart just because a user filled out a “Contact Us” form with their e-mail address? What if my mom uses my laptop and logs in to her account? Should her traffic and activities be attributed to my contact? Lots of questions here and likely some best practices to be developed.

    Just Scratching the Surface (of the buzzwords)

    That’s it for Part 3 of this series on Sitecore 7.5. In part 4, we will explore xDB deployment options on premise, as well as using Sitecore’s cloud hosting option for all your big big data.

    Cloud?? Big Data??? WHAT??!!!

    Say What Again


    Read more... Pre-Disqus Comments (2)
View more