Source Data Perturbation
in Statistical Disclosure Control

Menno Cuppen
Statistics Netherlands
(mcpn@cbs.nl)

Abstract
When tables of quantitative data are generated from a datafile, the release of those tables should not reveal information concerning indi-vidual respondents. This disclosure of individual respondents in the microdata file can be prevented by applying disclosure control meth-ods at the table level, but this may create inconsistencies across tables. Alternatively, disclosure control methods can be executed at the mi-crodata level, but these methods change the data permanently and do not account for specific table properties. These problems can be circumvented by assigning a weight factor to each respondent in the microdata file. Upon tabulation, each contribution of a respondent is weighted multiplicatively by the respondent’s weight factor. This ap-proach is called Source Data Perturbation (SDP) because the data is perturbed at the microdata level, not at the table level. It should be noted, however, that the original microdata is not changed. Moreover, the weight factors can be chosen such that the tables generated from the microdata are safe, and the information loss is minimized.
Keywords
Confidentiality, Disclosure, Source Data Perturbation, Noise addition, Table protection