• Black Art Revisited: Sitecore DataProvider/Import Hybrid with MongoDB

    Posted 09/11/2015 by techphoria414

    tl;dnr -- Combine data import and DataProvider approaches for great justice. Find the source on GitHub.

    My Black Art of Sitecore DataProviders article still gets a lot of hits, which to some degree makes sense -- creating a DataProvider hasn't really changed since Sitecore 6.0 (though I very much need to explore pipeline-based item providers in Sitecore 8). But if I've learned anything myself since then, it's that there are actually very limited circumstances in which you want to implement a DataProvider. Why's that?

    • They're difficult to implement
    • It's difficult to invalidate Sitecore caches when data is updated
    • It's difficult to trigger search index updates when data is updated
    • You can't enrich the content or metadata in Sitecore (e.g. Analytics attributes)
    • You are dependent at runtime on the source system
    • They're even more difficult to implement in a way that performs well

    So the fallback is usually an import via the Item API. But I've never really seen anyone happy with this either, especially for large imports. Why's that?

    • Writing data to Sitecore is SLOW. Even with a BulkUpdateContext.
    • Publishing can be very slow as well, though Sitecore 7.2 improved this significantly.
    • Usually you end up using a BulkUpdateContext, which means needing to trigger a search index rebuild, and potentially a links database rebuild, after every import.
    • Scheduled updates mean that it can be hours and hours before a data change in the source system is reflected in Sitecore.

    Is there a way to integrate data with Sitecore that balances the immediacy of a data provider, with the simplicity and enrichment ability of an import? Maybe! I'm presenting here a POC that combines the two approaches to try and achieve just that. We do this by introducing an intermediary data store that you may already have in your Sitecore 7.5 or 8 environment -- MongoDB.

    Data Provider / Import Hybrid

    The basic process:

    1. Data is pushed frequently from an external system into MongoDB. Writing data to MongoDB should be quick and easy, so it can be done very often, in theory. Maybe the data is already in MongoDB, in which case you are set.
    2. New items (products in this case) are imported frequently into Sitecore. This can be done quickly and more often, in theory, because we are implementing minimal data -- just creating the item, and populating a field with an external ID.
    3. We use a DataProvider with a simple implementation of GetItemFields to provide field data for the item directly from the MongoDB.
    4. To ensure caches are cleared and indexes are updated when data changes, we monitor the MongoDB oplog, a collection that MongoDB maintains to help synchronize data between replica sets.
    5. Content editors can enrich data on the item as needed. Externally managed fields can be denied Field Write to prevent futile edits.

    So, does this work? Glad you asked. I put together a POC and recorded a walkthrough, which you can find below. In the video, I go into more detail on import vs data provider, and some of the potential gotchas of the hybrid approach.

    Again, this is all theoretical. Has not been attempted in a production implementation. But I do think there is potential here, especially given that MongoDB is going to be found in more and more Sitecore environments going forward. Feedback is welcome, as are pull requests. :)

    Full source code can be found on GitHub.

    Read more...
  • Remove a Page from the Sitecore xDB (Sitecore 8 Technical Preview)

    Posted 10/01/2014 by techphoria414

    Note: Information in this post is based on the Sitecore 7.5 and 8 Technical Previews and is subject to change.

    Here's a quick one. I've been doing some JMeter traffic generation on my xDB for a forthcoming post/video on the Path Analyzer, and to push data from session to the xDB quickly, I implemented an "end session" page to hit at the end of each test thread.

    (I will hopefully also get a chance to share my work in JMeter, but in the mean time you should start where I started, with Martina Welander's awesome post.)

    Unfortunately the first time around I forgot to exclude my /EndSession.aspx from analytics, so it was muddying my data. However it was pretty easy to directly remove this from the xDB using a mongodb query.



    Basically, I'm telling mongodb to update documents in the the Interactions collection and remove elements from the Pages array where the URL path is /EndSession.aspx. The final "true" argument tells mongodb to update all documents which match the query (the empty first argument in this case), not just the first it finds. For more info on what's going on here, check the mongodb documentation on the update() method.

    After running this, I had to rebuild the sitecore_analytics_index using the Indexing Manager and rebuilt the reporting database using /sitecore/admin/RebuildReportingDB.aspx.

    This query was used on a Sitecore 8 xDB but based on the Sitecore 7.5 xDB Technical Preview, would work with 7.5 as well.

    - Nick / techphoria414 


    Read more... Pre-Disqus Comments (1)
  • Active Commerce

    Sitecore 8 Technical Preview: Active Commerce and the Experience Explorer

    Posted 09/26/2014 by techphoria414

    Today Sitecore released a Technical Preview of Sitecore 8 to the MVP community, and like good MVPs we are all scrambling to install the preview and write our first blog post on the beauty of Sitecore 8.

    Note: This article and video is based on a Technical Preview of Sitecore 8. Features and functionality are subject to change.

    Sitecore showed many amazing features of this forthcoming version during the Symposium events in Las Vegas and Barcelona. One "lesser" feature (really only in relation to the other amazing features) is the Experience Explorer. The basic idea is to simulate various visit and visitor segments in order to test personalization and other behaviors. Since Active Commerce utilizes the Sitecore Rule Engine for cart promotions, I was curious as to whether the Experience Explorer could be used to test cart discounts with Active Commerce. The answer was most definitely yes -- check it out below.

    It's worth noting that getting Active Commerce running on this Sitecore 8 Technical Preview took no code changes from the POC which I did on Sitecore 7.5. Neither did getting the Experience Explorer to work with our promotion engine. This is another great example of why we built Active Commerce natively within Sitecore, and why we say Active Commerce is "Sitecore e-commerce done right."

    It's worth noting that there is an Experience Explorer Module available for earlier versions of Sitecore as well.

    Great work Sitecore. Can't wait for Sitecore 8 to go gold.

    - Nick / techphoria414



    Read more...
  • One Month with Sitecore 7.5, Part 6: Extending Report Data via Aggregation

    Posted 08/28/2014 by techphoria414

    In the final part of this series investigating Sitecore 7.5, we’ll look at how the new analytics and reporting structure allows us to extend the processing framework, and create new data in the reporting SQL database.


    By moving collected analytics data to MongoDB, Sitecore solved issues of scalability and extensibility. However it did not help them with the problem of doing reporting on these massive data sets. While MongoDB is a great platform for storing and retrieving documents, relational databases still rule the world of complex queries and data analysis. So rather than eliminate the SQL database from analytics, Sitecore introduced a processing framework that can aggregate data into a new relational data structure which has been optimized for reporting.

    The new reporting database contains a series of fact and dimension tables, which is a common structure utilized by business intelligence and data warehousing tools. In short, a fact is an event, potentially with some sort of measurable data about the event. A fact for example might be a page view (including duration) or a website visit (including the number of pages visited). Facts are structured in a way that should allow easy summation, grouping, etc for reporting purposes. A fact record would contain foreign keys to dimension tables, which would contain data about the people or objects which were involved with the event. This could be the Sitecore item which was visited in a page view, or the contact which visited your site. In essence, dimensions are lookup tables.

    This new Sitecore 7.5 analytics framework also allows you to extend the reporting database with your own fact and dimension tables, and to extend data processing to populate them. You may perhaps want to do some reporting on data you have added to the contact, or on data you are collecting about user interactions via page events.

    In the preview release of Sitecore 7.5 provided to MVPs, the process for creating a custom aggregation is described in the Customization chapter of the xDB Configuration Guide.

    1. Utilize events or other analytics to log the data you wish to aggregate.
    2. Create a new Fact table.
    3. Create model classes for the key and value of your Fact.
    4. Create a new AggregationProcessor and register in the aggregation pipeline.
    In this example, we are going to create a new fact table with data about what products are added to our users’ shopping carts. Note that the Sitecore documentation is much more thorough in describing this process -- be sure to reference it. Consider this your introduction/overview.

    Use Page Events to Collect Cart Data

    For this POC, I just added the event to the existing Active Commerce shopping cart logic. It’s obviously important here to include any data which you wish to include in your aggregation. You’ll also need to create the event in Sitecore.




    Create a new Fact Table

    The irony of Sitecore introducing a NoSQL database to its architecture in 7.5 is that for the first time, Sitecore is also giving you a reason to create new relational tables in SQL Server. Well, at least it’s ironic to me.

    My new fact table contains information on all the products which users have added to their carts. You are typically going to have an aggregate primary key which contains the columns that define the uniqueness of the event. For our “product added” fact, that will be the product code (unique product identifier), the date of the event, the site the user was browsing, and the contact who added the product to his/her cart. The only aggregated value we are tracking on this fact is the quantity added.

    We’ll also add foreign key constraints to the appropriate dimension tables. Note that Sitecore recommends that you create these constraints to document dependencies with dimension tables, but that you disable them to improve performance.





    Create Model Classes for your Fact

    Our next step is to create model classes for our new fact table, which Sitecore will map for us during the aggregation process. We’ll need a DictionaryKey subclass for our “key,” which contains our primary key fields, and a DictionaryValue subclass for our “value,” which contains the aggregated value(s) for the event. We’ll also create a Fact subclass which combines the two.
    Sitecore seems to do the table and field mapping based on naming, and also seems to handle the obvious type mappings between Guid/uniqueidentifier, DateTime/shortdatetime, string/varchar, long/bigint, etc. The current early-release documentation is incomplete on this subject. The use of the Hash32 type for our site dimension ID, for example, was based on reviewing existing facts and aggregation processors which Sitecore includes in 7.5.

    The constructor for the Fact base class accepts a reduction function which we must provide. This function combines, or aggregates, two values for a given fact key. If we were to process two events which have the same key, this function would be called to aggregate their values before the fact is written to the reporting database. In this example, we simply add the values together, as I suspect would often be the case. Your DictionaryValue subclass is a logical place to create the static function that’s needed here.





    Create a New AggregationProcessor

    Here’s where the real work happens. As you might have expected, aggregation processing happens in a pipeline. When a visit is being processed, it is passed through the interactions pipeline and each processor has the opportunity to perform aggregation for the facts for which it is responsible. The processor itself can examine data in the visit, and “emit” facts.

    What’s interesting here is that you could theoretically call out to other data sources here in constructing your facts -- you aren’t limited to data being processed from the xDB. I’m also curious as to whether the processing API would allow distribution of processing work for other data sources beyond visits, perhaps calling a custom pipeline. But that’s an investigation for another time.

    For our processor here, we need to iterate over the pages in the visit, and look for any shopping cart events. If any are found, we’ll use the Fact API to construct a new fact, and “emit” it with its key and value. Behind the scenes, this will call our aggregation function as needed. The processing API also provides some other utility calls we need, to find or create the site dimension as needed, and to translate the date/time precision of our event as needed. The default precision strategy will “round” the date/time to the minute. This would, in theory, allow you to run and filter reports with minute-by-minute precision.


    Finally, we’ll need to patch in this new processor to our Sitecore config. Note that there appears to be some new grouping available in the pipeline configuration now. As the number of pipelines in Sitecore continues to balloon, this totally makes sense. Perhaps Sitecore will shed more light on this new structure as 7.5 comes closer to release.





    Rebuild and Test

    To test our new processor, we need to rebuild our analytics data. To facilitate rebuilding of analytics data, Sitecore actually requires that you have two reporting databases, so that one can still be available for reporting, while the other is rebuilding. These are simply configured as the reporting and reporting.secondary connection strings. Testing of the rebuild can then be done through a new administrative screen, /sitecore/admin/RebuildReportingDB.aspx.


    Click “Start” and Sitecore will begin to process, and update you on progress as it goes.


    If you have a lot of data, rebuilding could obviously take some time. On large sites which have collected a lot of data, it may be necessary to keep a reduced data set around for testing purposes. Otherwise the debugging cycles for new aggregations could become very long and arduous.

    Once processing is completed, aggregated data should appear in your fact table.


    Reporting

    Now that we have this additional data available, how do we best report on it? One option I imagine would be creating some cool new SPEAK-based reporting UIs. I am not experienced enough with the framework yet myself to say, but it seems like it would be easy enough to wire up some SPEAK charting components along with a SQL-based data source to create your own reports. But that will be a post for another day, perhaps by someone else!

    I did want to attempt to push my data into a Stimulsoft report (Engagement Analytics) as well, which seems like it would be easier. But at the moment I’m getting an error when attempting to access report items in the Content Editor. And thus I am bailed out by beta software. But the point is -- you have some options for creating reports based on your new data.

    That’s it!

    And that brings us to the end of our series on Sitecore 7.5. This release of Sitecore truly brings the infrastructure and architecture of DMS to the next level. As always, it will be exciting to see what partners and customers do with the framework. We at Active Commerce are very much looking forward to using the framework to bring new functionality and great new data to our customers.


    Read more...
  • One Month with Sitecore 7.5, Part 5: Persisting Contact Data to xDB

    Posted 08/28/2014 by techphoria414

    With its flexible schema and scalable architecture, the xDB immediately becomes an attractive option in Sitecore 7.5 for storing all sorts of user-centric data, particularly anything you are interested in utilizing for reporting purposes. Developers who have worked with the .NET MongoDB Driver know how easy it is to persist any object data to the database. However, for good reason, your access to xDB is a bit more abstracted than this. You do, however, have three options for persisting contact data to the xDB.

    I myself only implemented one of the options below in my search for a means to persist shopping cart data in a POC for Active Commerce. But I’ve provided an overview of all three options.

    The submitContact Pipeline

    We saw in Part 3 of this series how data can be associated with the current contact via the Contact.Attachments dictionary.



    Though very useful, data in the Attachments collection is not persisted with the contact when the session is flushed. However, you could tap into the submitContact pipeline by creating your own SubmitContactProcessor, and persist the data to your own collection in MongoDB.



    As for how you persist, and how you load that data later, you’re a bit on your own. There is no corresponding loadContact pipeline at this time, and your best option for persistence appears to be accessing MongoDB directly via Sitecore.Analytics.Data.DataAccess.MongoDb.MongoDbDriver. You could then potentially access that data via an extension method on your Contact. Not ideal, and I’m not sure whether this would work with xDB Cloud.

    This did not seem to be the ideal option for me. I wanted something more straightforward which worked within the existing xDB data structures.


    Contact.Extensions.SimpleValues

    This structure on the contact seems to allow storing of simple name/value string pairs that are persisted and loaded with the contact data. This is a step forward, but for a shopping cart, I needed something that could handle a complex object.



    Contact Facets

    Not to be confused with search facets, contact facets allow you to define entirely new model classes that can be stored with the contact, and accessed via Contact.GetFacet<T>(string). Here we have an option which allows us to store complex data with the contact, without having to worry about persisting the data ourselves. Sitecore 7.5 includes a number of contact facets, which can be utilized to store additional information about the contact. This data appears to help fill out the Experience Profile report.

    The facets are configured in a new /sitecore/model configuration element, which defines various data model interfaces and their implementations, and associates them to entities (a contact in this case) with a given name.





    For example, to fill in a contact’s first/last name, we can use the Personal facet.





    Implementing your own facet requires a few steps, but is not difficult. The steps below include my POC for persisting shopping cart data.

    1. Create an interface for your facet which inherits IFacet. Add your desired fields.
    2. Create an implementation which inherits Facet. Use base methods to “ensure,” “get,” and “set” member values.
    3. For composite object structures, create an IElement and Element following the same pattern.



    4. Register your element in the /sitecore/model/elements configuration.
    5. Register the facet in the /sitecore/model/entities/contact/facets configuration.



    6. Access the facet via Contact.GetFacet<T>(string).


    After the contact’s session is flushed, you can very plainly see your new data persisted with the Contact. Nice!!

    MongoDB Facet Data


    As you can see, facets are an easy and powerful means of persisting contact data to xDB.

    That’s it for Part 5! In the last part of this series, we’ll look at another new extension point available in Sitecore 7.5, data aggregation.


    Read more...
View more
Sitecore MVP

Syndication

Archive