Data leakage

Data leakage

So how are you going to find out that your junior employee Phil or senior manager Chris shared confidential information or personal data of your customers or employees without permission?

It’s all about data. The more the better. We're all trying to collect as much data as possible and w're fighting for ownership even if there is no obvious value in it. As a result, we are storing petabytes of managed and unmanaged data. It can be any data and obviously includes personal information. Most of the time we don’t even know what’s in there as we are not using it. We just store it.

Your right to be forgotten

Imagine you had a mobile contract with an operator called ABC and decided to switch to a different provider called DEF because they offered a better deal. Even when your contract was terminated ABC still stores your private details (bank account information, address, date of birth, etc. etc.). Mostly this is the type of data you’d want to keep away from public access.

Now you can ask ABC to delete everything they have on you. This is your “Right to be forgotten”. This seems like an easy task for a corporation but in fact it’s a BIG CHALLENGE.

Does INDICA make information about you available for anyone within the company you referred to? The answer is very simply – no, it does not.

ABC has your personal data spread across their IT environment and duplicated over different departments. For example, they use it to bill you, to send mail, to identify you when you call in for support, to track your behavior and preferences and so on. Your data could be stored anywhere inside the company. It might take days or even weeks to find out where it is with no guarantee that all data was collected.

With new regulations ABC must deliver your Right in a limited time frame otherwise they can be fined up to 4% of their worldwide turnover. This regulation applies not only to mobile operators but to any company or government. It can be a grocery store, bank or a previous employer. It means that companies need to have some tool and procedure in place in order to comply and avoid penalties.

The INDICA GDPR Module is a tool that is able to identify your personal information across all IT systems within the company and provide information on its location and accessibility to different people, groups and (3rd-) parties. It creates a personal card, hence person responsible for private data can execute a deletion procedure. 

The next logical question: does INDICA make information about you available for anyone within the company you referred to? The answer is very simply – no, it does not. Normally the solution has a single administrator and a limited number of users within the organizations. Together with this INDICA inherits access right policies set within IT systems, so a user will only find information he is allowed to see.

Keep in mind that your personal data is priceless. Be careful when giving it away and demand it back after an engagement is over. 

How to find hidden relationships between emails and PO's

Investigations span the domains of unstructured (i.e. email) and structured (i.e. ERP) information. These systems are separate and it is almost impossible to find relationships between the data in stored in different systems and applications. However, Indica developed a unique solution capable to reveal these relationships and to find unique results to revolutionize investigations.

Regular eDiscovery solutions are focused on unstructured information, like email, files etc. Next to that, structured information is the typical domain of systems like ERP, CRM, and etc. with records in a database. The challenge is that the two worlds are more or less independent from each other. If the investigation is about fraud it probably is of great importance to verify the communication in email boxes and combine that with transactions in a financial system. If we could find a relationship between those data sets, then a totally new world of possibilities opens up.

At INDICA we developed a unique algorithm to verify relationships between unstructured and structured information. In Indica you can add structured sources. It can be a database or a static copy of that database. It could also be a dynamic interface to a database of a live system. In our demo stand we have a product database from the internet and we have a customer list from an ERP system. Extra sources assist in information ranking, but also ease the click through all related information in the whole data set. To find information and discover relationships at your fingertips you simply click through, no need to enter manual queries. It is possible to scan an unstructured document for words that exist in structured sources, and go back and forth between all those sources.

Investigator should simply ask himself the question what kind of structured information should be in scope. The amount of structured information could be limited, it is not necessary to include all the transaction history. For example, vendor or customer lists, product lists, purchase or sales orders etc. The information from these lists are fully indexed together with all the other information, so the performance while searching is ultra fast.

The algorithm is patented and unique. So anybody who would like to give this a try, can reach out to INDICA.

Continuous Delivery at INDICA

Recently I attended a workshop about Continuous Delivery, hosted by ICTOffice and presented by people from NISI and the University of Utrecht.

Mainly, I had questions regarding the level of integration of our development process and the tools involved. We created the first (baby-)steps into the software development of INDICA quite a long time ago. The modern tools for integration of your process didn’t exist yet, let alone the words describing it!

At first, we used SVN for version control. At the time, this was revolutionary – especially compared to CVS. (Of course, this is not true; all these tools have their pro’s and cons, mainly depending on who’s using it and how he’s using it.) We used MantisBT for our bugtracking. And lots and lots of scripts to be able to continuously deploy to a build server for testing and building. 

And this has worked properly for many years.

In the meantime, INDICA is growing. Not only do we have more people maintaining and developing parts of the software, also the software is growing and getting more mature. Tools have appeared that might make our work easier. But basically, INDICA has not really changed. It’s still an appliance with highly integrated tools, parts and systems.

We migrated from MantisBT to Atlassian Jira a while ago. We started using Confluence for the more important stories. And we created a scripted testing environment which took away most of our tedious tasks for testing the installation, provisioning and basic testing of the frontend.

But also, recently, we migrated from SVN to Bitbucket.

This turned out to be a not-so-trivial step. We got really used to our SVN in combination with the scripted environment and we’re actually still running into problems with the Git environment. It’s getting better though, every week we have less creases to iron out.

Back to this workshop. During the workshop I tried to pry as much information loose from all attendees as I could. Luckily the presenters helped out, they were as interested as I was. As I already stated, I really wanted to know three main things:

  • How mature is our process?
  • What other tools are being used?
  • What are the main alternatives to the tools we already use?

It turns out, all these companies don’t really have good alternatives for their tools yet. The choices we made were quite mainstream on one hand, but without a real good alternative on the other. So even if we have quite some configuration or usability problems with the Jira/Confluence/Bitbucket suite, an alternative that is as well integrated as this didn’t pop up.

Also, and this came as quite a surprise to me, our process is quite mature. Although we started out with Continuous Integration ‘Avant La Lettre’, we’re on a quite good track.

The development cycle, with 3 week sprints, regular builds and continuous feature and bug improvements, fits us well. The testing part might need some improvement on the unit-testing side, but also we use a proper script for this and partly automate the process daily.

Also, this can be improved if we revisit the set-up. The current appliance might be better off if we create proper packages, therefore prerequisites, dependencies and versions can be better controlled. 

But the integration, deployment and delivery started out really advanced and is currently on par with what’s out there. With lots of room for improvement.