Priority Driven K-Anonymisation for Privacy Protection

Sun, X., Wang, H. and Li, J.

    Given the threat of re-identification in our growing digital society, guaranteeing privacy while providing worthwhile data for knowledge discovery has become a difficult problem. k-anonymity is a major technique used to ensure privacy by generalizing and suppressing attributes and has been the focus of intense research in the last few years. However, data modification techniques like generalization may produce anonymous data unusable for medical studies because some attributes become too coarse-grained. In this paper, we propose a priority driven k-anonymisation that allows to specify the degree of acceptable distortion for each attribute separately. We also define some appropriate metrics to measure the distance and information loss, which are suitable for both numerical and categorical attributes. Further, we formulate the priority driven k-anonymisation as the k-nearest neighbor (KNN) clustering problem by adding a constraint that each cluster contains at least k tuples. We develop an efficient algorithm for priority driven k-anonymisation. Experimental results show that the proposed technique causes significantly less distortions.
Cite as: Sun, X., Wang, H. and Li, J. (2008). Priority Driven K-Anonymisation for Privacy Protection. In Proc. Seventh Australasian Data Mining Conference (AusDM 2008), Glenelg, South Australia. CRPIT, 87. Roddick, J. F., Li, J., Christen, P. and Kennedy, P. J., Eds. ACS. 73-78.
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS