When we want to access transactional data long after the transaction has been posted to a system of record, we need to make sure we have enough context to reconstruct the history.
We almost have to go against the fundamental principles that we learned when designing databases - especially transactional databases. Normalize the data to reduce redundancy, remove the possibility of update/delete anomalies. But these historical (sometimes wharehousing oriented) systems don't work that way. We have to scoop up what else was relevant at the time of the transaction. So that we can do unlikely analytics.
Web properties have known this forever. When doing A/B testing the whole interaction context is harvested so that minute behavioral changes can be analyzed - long after the original master data have been purged.
There is a second dirty and ugly secret lurking here too. Not only do we have to capture more data than we ever thought, but actually even if we have the data we may nor be able to use it because the application required to rematerialize it no longer exists. We have upgraded our applications remembering to change the DB schema, but without a good way to get the right version of the application in place at the point in time for a particular piece of data that we wish to examine, we still not be able to make use of the data..
Continuous data arhival synchronized with the proper copies of the applications presents challenges at a scale that the big data world is just getting to grips with.
- How do we go back and find who sold a particular item years after a sale was registered?
- How do we compare performance over time when the rules have changed?
- How do we reprice an item to produce a receipt (perhaps for provenance) when we don't know what the pricing rules were at the time of the relevant sale?
We almost have to go against the fundamental principles that we learned when designing databases - especially transactional databases. Normalize the data to reduce redundancy, remove the possibility of update/delete anomalies. But these historical (sometimes wharehousing oriented) systems don't work that way. We have to scoop up what else was relevant at the time of the transaction. So that we can do unlikely analytics.
Web properties have known this forever. When doing A/B testing the whole interaction context is harvested so that minute behavioral changes can be analyzed - long after the original master data have been purged.
There is a second dirty and ugly secret lurking here too. Not only do we have to capture more data than we ever thought, but actually even if we have the data we may nor be able to use it because the application required to rematerialize it no longer exists. We have upgraded our applications remembering to change the DB schema, but without a good way to get the right version of the application in place at the point in time for a particular piece of data that we wish to examine, we still not be able to make use of the data..
Continuous data arhival synchronized with the proper copies of the applications presents challenges at a scale that the big data world is just getting to grips with.
No comments:
Post a Comment