Monday, May 10, 2010

Log Retention Nightmare

In My previous post I was talking about managing apache web server logs centrally with MySQL, But throughout the post it was constantly rattling my mind thinking about how difficult is it to manage and to reach on a perfect log retention policy. Is it really a nightmare for IT managers?

Log retention in an application cluster or across the server layout is an effort that deal with the issue of maintaining information in your possession for a pre-determined length of time. Different types of logs require different lengths of retention. Retention policies usually describe the procedures for archiving the information, guidelines for destroying the information when the time limit has been exceeded and special mechanisms for handling the information when under litigation.

First, quickly sum up the bottom line on the 'what' and 'why'. Now while trying to understand the definition of "Why"
It really comes down to two key points:

1) Legal requirements.
2) Business requirements.

Legal Requirements:

I am not a legal expert nor am I in sync with recent changes in IT laws. But i have come across situations when information regarding certain users last access information like from which IP he accessed the site last, time etc. are sought by various individuals or parties when they sense some sort of e-Crime through our site. To be more precise this happened in one of my recent assignment which happens to be one Social Networking site and sensitive information was quoted to top BU heads from various profiles and we ran out of breath to figure out who this guy is! Luckily we were good to capture them tight. Recently I read it somewhere that Governments in various countries are considering laws that will require communications service providers to capture and archive mobil/telephone and internet traffic data for periods from 6 months to 2 years. These laws have for the most part not yet been enacted.

Business Requirements:

Its tempting to sweep the whole pile into trash. is not it? So, should administrators simply delete everything as quickly as possible? I wish it were that simple. Day to day business activities often dictate the length of time information needs to remain accessible. In addition to legal requirements, individual businesses have their own data retention requirements that can range from contractual obligations with customers or suppliers to administrative or operational information such as policies and procedures that define daily functions. Each business must set their own data retention requirements to sufficiently maintain their business operations.

The crucial point is to identify all the log files that might contain information of interest. Secondly, for each set of logs which i identified as useful one how long I need to have them immediately available on-line. Keeping them online is what is killing me as because, once i decide to make them available off-line say in DVD backup or so, issue stands more of a administrative decision rather than a technical preparation. This can resolved with a meeting with various BU head or stakeholders.

The other aspect of log retention policy for business requirements is to use this information to improve the quality of your services and for other business purposes. For me this is very very important if you mission is continuous improvement od end user experience of your services. Data mining on logs holds the key to uncovering and cataloging the authoritative links, traversal patterns, and semantic structures that will bring intelligence and direction to our Web interactions. Huge volumes of access and usage information—provides a rich and unprecedented data mining sourceand key source of information for interaction designers. Identifying the trends of usage of my site, can really help me to work out on a more business focuss strategy. Hence with the presence of considerable amopunt of logs data mining activities can expand rapidly allowing firms to retrieve highly personalized data about customers, and the exact set if crowd.

Log retention is a complicated balancing act. one calls for a aggressive destruction of electronic data after a short time period. On the other hand is the philosophy that promotes the saving of everything indefinitely. There is no concrete answer when establishing a data retention policy. On one hand you need to save information required by law and your business. On the other hand, you should delete irrelevant, outdated and nonproductive data as quickly as possible. Also you need to plan ahead for potential discovery requests in connection with litigation cases.

Just to conclude what i feel is that - considering all legal aspect, with an intention to improve web experience to our end user, we need to do a detail 'benefit analysis' and publish a suitable retention policy for our logs, and immediately off lining those which dont serve any of these goals.

Does not this call for a real bold IT manager for this bravado act ? :-)


No comments:

Post a Comment

Why Database CI/CD?

Making the Database Part of Your Continuous Delivery Pipeline The database, unlike other software components and code or compiled co...