CDHA Logo

A Center for Interdisciplinary Research and Training in Population Aging and Health at University of Wisconsin - Madison

CDHA Logo

 

Events

Weekly Methodological Seminar


 Special Workshops and Conferences

Upcoming Events

Prior Meetings and Papers

Visitor Information


 

Upcoming Events

Statistical Disclosure Control for Data Confidentiality
November 10-12, 2004

Abstract
Argus Software
Program
Schedule
Trainers
Registration
Accommodations
Contact Conference Organizers
Visitor Information
Downloadable Software, Manuals and Handouts

Abstract

Three specialists from Statistics Netherlands, Anco Johannes Hundepool, Eric Schulte Nordholt, and Peter Paul de Wolf, conducted the training workshop, which provided participants with an understanding of the theoretical mathematical aspects of statistical disclosure control and the application of these methods using the ARGUS software. The theoretical discussion was most accessible to researchers with graduate training in statistical methods (e.g., econometrics, statistical methods for sociologists).

A portion of the last day of the conference was devoted to a discussion of participants’ work and to the ways in which SDC methods and the ARGUS software might be used to estimate disclosure risk and protect data confidentiality. Participants were encouraged to prepare a brief (1-2 page) description of a study with which they are working, a list of questions and problems related to estimating and controlling disclosure risk in their data, and a sample data set, and to submit these to CDHA by the last week in October so that the trainers would have time to review them and to tailor their presentations and workshop activities to participants’ best advantage.

Top of the Page

Argus Software

For statistical agencies and extramural researchers alike, balancing data sharing requirements and concerns about data confidentiality is an increasingly difficult task, in part because implementation of the latest statistical techniques for protecting data confidentiality (called “disclosure limitation techniques” or “statistical disclosure control”) in many cases involves complex optimization problems which are difficult to program efficiently. The goal of the CASC project (Computational Aspects of Confidentiality), funded by the European Union, is to address these difficulties by developing user-friendly software that can be used to estimate disclosure risk and implement statistical disclosure control techniques for microdata and aggregate tabular data. A second goal of the project is to undertake the research necessary to support the development of this software.

The project has developed two software programs, μ-Argus and τ-Argus, which can be used to assess disclosure risk and to protect microdata and tabular data, respectively. The software is of use to researchers responsible for creating public-use datasets (microdata or aggregated tabular data) including the major aging-related data resources supported by NIA (e.g., HRS, WLS, NLTCS). It is also useful to those at secure computing facilities who are responsible for reviewing statistical results for confidentiality problems. The latter includes facilities like CDHA’s OLDR (Offline Longitudinal Data Repository), MiCDA’s Data Enclave, and PARC’S LADR (Limited Access Data Room). In order to ensure that respondent confidentiality is protected, staff from these computing facilities reviews any statistical output that researchers wish to remove from the center. While these rules of thumb may be sufficient to prevent respondent identification, it is likely that they result in unnecessary and sub-optimal suppression of statistical information that, in the presence of a more systematic means of estimating disclosure risk, could be made available to the researcher. In addition, it is important to realize that once the risk of disclosure has been identified as being too high (on the basis either of rules of thumb or more systematic estimation procedures), computing staff need to come up with a way of communicating to researchers what the problem with their analysis is without being so specific as to indirectly disclose the information they seek to protect. The Argus software can be a valuable tool in working with researchers to revise their analyses so as to protect respondent confidentiality while, at the same time, maintaining the usefulness of the analysis in achieving research objectives.

Because the independent development of software tools like Argus is tremendously resource intensive, the need to maximize the returns to publicly funded research by sharing data and analytical results as widely as possible urgent, and the protection of respondents privacy and data confidentiality an ethical and practical necessity, we believe it is critical that researchers and computing staff at the NIA- and NICHD-funded demography centers be exposed to the latest research on disclosure limitation and be given an opportunity to test the Argus software through a hands-on training course. To the extent possible, we would also like to provide researchers and managers from the federal agencies, major data archives, and data-related professional organizations with an opportunity to try out the Argus software. In cooperation with another EU-funded project called AMRADS (Accompanying Measure for Research and Development in Statistics), the CASC project has developed a 3-day training workshop on statistical disclosure control methods and their implementation using the Argus software. To date, however, this training has been available only in Europe. We have invited researchers from AMRADS and CASC to offer a similar training program, tailored to meet the practical needs of the NIA- and NICHD-funded national demographic surveys and demography centers.

Top of the Page

Program

Statistical Disclosure Control is an important topic in the production process of statistical offices. Modern production processes in these offices are capable of producing large statistical databases and detailed tables. This is very useful for the users of the statistical information like policy makers and researchers. Nowadays the users of statistical information have the capacity of handling large amounts of data and can perform complex analyses on their own computers. However, the other side of the coin is that there is a real risk of breaking the privacy of the respondents. And safeguarding the privacy of the respondents is vital for Statistical Offices. Failure to do so will be not keeping the promises made when we collected the data but also will lead to a suspension of the cooperation in the participation of our surveys and censuses. This awareness has also lead to several national and international initiatives to research this topic and also to develop software like ARGUS to implement the results of this research. The state of the art is now that the results of this research are available and that the ARGUS software has been released to implement the theory of SDC.

The course covered a state of the art overview of disclosure control covering both microdata protection and the protection of tabular data, the two major forms of output of a statistical office. Both for tabular data as well as for micro data we showed where the risks of disclosure are, which methods are available to measure the risks of disclosure and which methods can be applied to protect the data against disclosure. The course was a combination of teaching sessions, where the theory was explained, practical exercises, where the participants worked with (small) examples, which showed them the disclosure risks and made them more aware of the problems and sessions to train them in the use of the ARGUS software, both μ-ARGUS and τ-ARGUS.

The target audience was the people actually working on these surveys as well as the survey managers. It is in our vision essential that also managers should have a profound knowledge of the issues of SDC.

The course leaders provided handouts of all the material used during the lectures as well as the manuals for the ARGUS software. They are available below.

As a result of this course the participants will have a good knowledge of the state-of-the-art in SDC and know how to implement this in their practical work, a.o. by applying the ARGUS software. The course will also provide a basis for continued study of the topic of SDC through further reading of SDC literature.

Top of the Page

Schedule

Wednesday 10 November

8:15-8:45 AM

Breakfast

9:00-9:30 AM

Introduction of course, lecturers, participants

9:30-10:15 AM

General introduction to SDC

10:15-11:15 AM

Theory/methods of SDC concerning microdata (General)

11:15-11:30 AM

Coffee break

11:30-12:00 PM

Exercises microdata I

12:00-12:30 PM

Theory/methods of SDC concerning microdata (Methods)

12:30-1:45 PM

Lunch break

1:45-2:45 PM

Theory/methods of SDC concerning microdata (Methods) (continued)

2:45-3:00 PM

Tea break

3:00-5:00

Demonstration and exercises with μ-ARGUS


Thursday 11 November

8:00-8:45 AM

Breakfast

9:00-10:30 AM

Exercises microdata II

10:30-10:45 AM

Coffee break

10:45-11:45 AM

Theory/methods of SDC concerning tabular data (General)

11:45-12:15 PM

Exercises tabular data I

12:15-1:30 PM

Lunch break

1:30-2:30 PM

Theory/methods of SDC concerning tabular data (Methods)

2:30-2:45 PM

Tea break

2:45-3:15 PM

Theory/methods of SDC concerning tabular data (Methods) (continued)

3:15-5:00

Exercises tabular data II


Friday 12 November

8:00-8:45 AM

Breakfast

9:00-11:00 AM

Demonstration and exercises with τ-ARGUS

11:00-11:15 AM

Coffee break

11:15-11:45 AM

Legal issues

11:45-12:30 PM

Onsite facilities and remote access/execution

12:30-1:45 PM

Lunch break

12:45-2:45 PM

User cases studies

2:45-3:00 PM

Tea break

3:00-4:00 PM

Evaluation and conclusion

Top of the Page

Trainers

The lectures in this course were Anco Hundepool, Eric Schulte Nordholt and Peter-Paul de Wolf, all of Statistics Netherlands.

Anco Hundepool studied mathematics at Leyden University and subsequently he joined Statistics Netherlands. He started his career in the department for Statistical Methods. His main interests were seasonal adjustment, compilation of price index series and a pilot study on purchasing power statistics. After that he was involved in the development of the Blaise system, became a project-leader for the Abacus tabulation package and the STATview dissemination package. Within the SDC-project, partially funded by the 4th Framework he was the project-leader for the development of the ARGUS package for statistical disclosure control. In the TADEQ project for the documentation of electronic questionnaires (also partially funded by the 4th Framework) he was responsible for the development of the TADEQ-software. In the 5th Framework CASC project he is the overall project leader and continues the development of the ARGUS software. Anco Hundepool presented his work on various conferences and has published in different refereed journals.

Eric Schulte Nordholt studied mathematics at the University of Utrecht and econometrics at the Erasmus University Rotterdam. He started his career as a researcher at the Department of Statistical Methods of Statistics Netherlands. In 1995 he worked as a detached national expert at the European Community Household Panel (ECHP) team of Eurostat in Luxembourg. Subsequently, he became senior researcher at the Department of Employees of the Division of Socio-economic Statistics. Since 2000 he is a senior researcher and project leader in the division of Social and Spatial Statistics. He is also Statistics Netherlands’ advisor on the statistical disclosure control of social data. Recently he was project leader of the Dutch Virtual Census of 2001 and together with his team he wrote a book about the analysis and methodology of this Virtual Census. His main interests are the analysis of panel data, census data, data editing and imputation, and statistical disclosure control. Eric Schulte Nordholt presented his work on various conferences and has published in different refereed journals.

Peter-Paul de Wolf studied mathematics at the Technical University of Delft (1986 – 1991). He started his PhD in mathematical statistics in 1991, at the same university, on extreme value estimation. In 1999, already working for Statistics Netherlands, he successfully defended his PhD.
In 1996 he joined the research department of Statistics Netherlands and worked on several subjects: sample survey design, editing and imputation, lifetime of capital stock, statistical disclosure control. In the recent years he has specialized in the field of statistical disclosure control and at present he co-ordinates the research in that area at the Research Department of Statistics Netherlands. Peter-Paul de Wolf presented his work on various conferences and has published in different refereed journals.

Top of the Page

Registration

The workshop was held at the Pyle Center on the University of Wisconsin – Madison campus. Detailed information is available on the web (http://conferencing.uwex.edu/pyle.cfm).

Because space was limited, invitations were extended to the NIA- and NICHD-funded survey projects and demography centers, and to selected federal agencies, data archives, and professional organizations. We asked those institutions who would like to attend to submit a list of candidates ordered by priority so that we could ensure that a space was reserved for at least one participant from each interested project or institution.

Accommodations

Top of the Page

The conference hotel was the Howard Johnson Plaza Hotel in downtown Madison. The conference hotel is within walking distance of the Pyle Center. The hotel runs a free shuttle service to and from the airport. The shuttle will also transport guests to and from the Pyle Center. Guests at the conference hotel received a welcome packet and workshop materials when they check in.

The hotel is located at 525 East Johnson Street, Madison, Wisconsin 53705. A block of rooms was reserved at the conference rate of $91.00 for a single or double. To make reservations, call the hotel at 608-251-5511. More information on the hotel is available on the web (http://www.hojo.com/HowardJohnson/control/home). If you make reservations on-line, please confirm that you are booked at the conference rate.

For participants staying elsewhere: Please be aware that parking on campus is extremely limited. Most spaces are reserved for permit holders, although there is some metered on-campus parking. Those not staying within walking distance of the Pyle Center should contact Mark Schmidt, Assistant to the Director at CDHA (email: mschmidt@ssc.wisc.edu; phone: 608-262-4715), to obtain a temporary parking permit. . There is a municipal public parking ramp near the Pyle Center, but please keep in mind that it fills up early. Participants who do not stay at the conference hotel can pick up their packet at the Pyle Center at breakfast on Wednesday, 10 November.

Top of the Page

Contact Conference Organizers

The conference was organized by the Center for Demography of Health and Aging. Funding for the Center and for the workshop was provided by the National Institute on Aging.

Questions about the program should be directed to Janet Eisenhauer.

Mark Schmidt, Assistant to the Director
Center for Demography of Health and Aging

University of Wisconsin – Madison
Social Sciences Building, Room 4418
1180 Observatory Drive
Madison, WI 53706
Phone: 608-262-4715
Email: mschmidt@ssc.wisc.edu

Janet Eisenhauer Smith, Data Analyst/Archivist
Center for Demography of Health and Aging

University of Wisconsin – Madison
Social Sciences Building, Room 4407
1180 Observatory Drive
Madison, WI 53706
Phone: 608-265-3937
Email:
jeisenha@ssc.wisc.edu

Top of the Page

Software and Other Files

At the workshop data CDs were handed out containing ARGUS software and handouts. All of the data is available below.

μ-ARGUS (Zipped)
τ-ARGUS (Zipped)
PDF Files (Zipped)

Top of the Page

 

 

Home | Data | Projects | Publications | Events | About | Search

Please send questions, comments or suggestions to cdha@ssc.wisc.edu

If you have difficulty accessing this page or have other questions or comments about the webpage please contact cdhadata@ssc.wisc.edu