Source Data Perturbation
in Statistical Disclosure Control
Menno Cuppen
Statistics Netherlands
(mcpn@cbs.nl)
Abstract
When tables of quantitative data are generated from a datafile, the
release of those tables should not reveal information concerning indi-vidual
respondents. This disclosure of individual respondents in the
microdata file can be prevented by applying disclosure control meth-ods
at the table level, but this may create inconsistencies across tables.
Alternatively, disclosure control methods can be executed at the mi-crodata
level, but these methods change the data permanently and
do not account for specific table properties. These problems can be
circumvented by assigning a weight factor to each respondent in the
microdata file. Upon tabulation, each contribution of a respondent is
weighted multiplicatively by the respondent’s weight factor. This ap-proach
is called Source Data Perturbation (SDP) because the data is
perturbed at the microdata level, not at the table level. It should be
noted, however, that the original microdata is not changed. Moreover,
the weight factors can be chosen such that the tables generated from
the microdata are safe, and the information loss is minimized.
Keywords
Confidentiality, Disclosure, Source Data Perturbation,
Noise addition, Table protection