Saturday, November 24, 2007

Request and response for child benefit data was incompetent

It is clear from correspondence between the National Audit Office and Her Majesty's Revenue & Customs over the lost files fiasco that this data should never have been requested, nor supplied.

NAO wanted to choose a random sample of child benefit recipients to audit. Understandably, it did not want HMRC to select that sample "randomly". However, HMRC could have used an extremely simple bit-commitment protocol to give NAO a way to choose recipients themselves without revealing any of the data related to those not chosen:

  1. For each recipient, HMRC should have calculated a cryptographic hash of all of the recipient's data and then given NAO a set of index numbers and this hash data.

  2. NAO could then select a sample of these records to audit. They would inform HMRC of the index values of the records in that sample.

  3. HMRC would finally supply only those records. NAO could verify the records had not been changed by comparing their hashes to those in the original data received from HMRC.

This is not cryptographic rocket science. Any competent computer science graduate could have designed this scheme and implemented it in about an hour using an open source cryptographic library like OpenSSL.

Ben Laurie notes that the redacted correspondence itself demonstrates a lack of basic security awareness. I hope those carrying out the security review of the ContactPoint database are better informed.


Watching Them, Watching Us said...

Why did the National Audit Office return, the CDs with "90 zipped files" of the Child Benefit Award database back to HMRC Child Benefit Office on April 16th 2007 ?

What possible use was this, by now slightly out of date, CD copy of the database i.e. not original paper documents, to HMRC ?

Why were the unencrypted CDs not physically destroyed, in front of witnesses, once NAO had finished with them, instead of risking yet another transfer between different office locations, even by hand, or perhaps again, via the internal post system ?

Similarly, how and why did the NAO send and return those March CDs and send the October CDs to the private sector firm of auditors KPMG ?

Is there now a KPMG laptop computer or USB key, loaded with a copy of the 25 million records, waiting to be stolen from some accountants car or lost in a pub ?

Why did the NAO not do their audit sampling from the HMRC CDs in house, before sending just the 1500 or 20,000 or whatever sample size of records they were planning to outsource to to KPMG for actual audit processing ?

There was no need to let KPMG see or get possession of copies of all 25 million people's details.

Have the March 2007 CDs returned to HMRC been physically destroyed ?

Surely both the National Audit Office and Her Majesty's Revenue and Customs are guilty of breaches of the (weak) Data Protection Act ?

sjmurdoch said...

This thought occurred to me too. You could probably do it even more efficiently with a Merkel Tree.

Either way, you would also need to include a large random number in with each record. Otherwise someone receiving the table of hashes could use it to establish if someone (with known details) is on the list.

Ian Brown said...

Ha! The Blogzilla-Murdoch audit protocol is born, with thanks to Ralph, Richard and George ;)

Martin Strandbygaard said...

While I agree on the principle of the suggest solution, I cannot help but wonder if it's really implementable.

The suggested approach assumes (1) availability of someone with the skill set and incentive to implement the exchange protocol and (2) an enviroment/culture where this sort of solution is acceptable.

My experience with systems/controls auditing is that neither is the case in this kind of scenarios. The people doing this sort of review work don't need - and consequently often don't have the necessary programming skills. Not to mention the bureaucracy of getting database interface access etc. (which they also don't need, and thus don't have).

Then there is the "cultural acceptability" of the solution. People with technical skills will immediately see the benefit of this sort of solution, but others will not understand, and thus not trust, such a solution.

Consequently, the "dump-your-dataset-and-we'll-work-on-it" audit approach is predominant.