Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Data management for students: Do's and Don'ts of research data

Research Data Management (RDM) for students of the Radboud University

Summary

On this page you can find information on the following subjects regarding the DO'S and DON'TS of research data:

What counts as research data

The definition of research data is very broad. All digital and nondigital information which is generated as part of the scientific process and on which scientific conclusions are based counts as research data. This includes measurements, speech and video recordings, questionnaires, Excel sheets with observations, SPSS files, but also graphs that you make as well as notes that you take.

Working with personal data

Personal data needs to be protected, safely stored and must not be shared publicly.[1]

Personal data are any data relating to an identified or identifiable living person. Personal data are thus:

  • any information directly identifying a person (e.g. someone’s name)
  • any information that can be traced back to a person in combination with other data

Thus, if a person’s identity is known or can be inferred, then any information you have about this person is considered personal data. To put it more bluntly, if a person is identifiable, then this person’s gender is considered personal data as much as information you have about their favorite type of pizza.

As mentioned above, there are certain types of data which can directly identify a person and are thus always considered personal data, for example: names, birthdays, addresses, postcodes, phone numbers, and email and IP addresses. These types of data are often collected for administrative purposes. Other types of such direct identifiers are photos, video recordings and audio recordings; they are thus also considered personal data.

Data that do not directly identify a person, but can be traced back to an individual in combination with other information are also considered personal data. For example, knowing that someone is female, is not enough to identify a person. However, knowing that someone is female with the additional information that this person was the chancellor of Germany, will lead you to Angela Merkel. Thus, if the combination of information in your dataset can be used to indirectly identify a person, then your whole dataset is considered personal data and needs to be dealt with accordingly.

You have to be particularly careful when collecting so-called special categories of personal data, such as health data, political opinions, religious beliefs, someone’s sexual orientation etc. [2] These data can be used to discriminate against individuals and must thus only be collected when absolutely necessary. Ask your local ethics committee’s for approval when wanting to collect special categories of personal data.

To summarize, when a person can be identified, then any information that relates to this person is considered personal data. Identification can happen through direct identifiers, or indirectly through a combination of information. Personal data must be treated with special care by protecting it, storing it safely and not sharing it with others. When collecting special categories of personal data, even stricter rules apply.

DO’S

  • Delete personal data that is only collected for administrative purposes and not needed to answer your research question (e.g., email addresses) as soon as possible.
  • Anonymize data where possible by removing possibly identifying information. When data is anonymized, that means that it is not possible for anyone (not even for yourself) to trace the data back to an individual. Anonymous data are not considered personal data anymore and can usually be shared more freely.
  • Alternatively, pseudonymize your data when anonymization is not possible (e.g., when running a longitudinal study). Pseudonymizing data means that identifiers are replaced by pseudonyms. For example, participants are called pp01, pp02 etc. in your data file, but in addition, you keep a key file telling you that pp01 is Jaap Smit. Pseudonymized data are still considered personal data.
  • Safely store personal data. See Safe storage below for more information.

DON’TS

  • Do not collect more personal data than absolutely necessary. For example, if knowing the age of your participants is enough to answer your research question, do not ask for your participants’ birthday.
  • Do not share personal data with anyone unless absolutely necessary (e.g., with your supervisor)

For more information about privacy and security click here

[1] It is only possible to share personal data if the participants explicitly provided informed consent for this. This should only be done when absolutely necessary and should be approved by your local ethics committee.

[2] See here for more information.

Safe storage

Safe storage is important, whether you’re working with personal data or not, because it prevents loss of data and data leaks.

DO'S

  • You can ask your supervisor to create a workgroup folder for you.[3] A workgroup folder is a folder that is stored on the university network. This means that the contents of the folder are protected. Furthermore, these folders are backed up daily. The daily back-ups are stored for 60 days. In addition, hourly back-ups are made and stored for seven days. You can retrieve these back-ups yourself in case you lose a file (more information here). You can access your workgroup folder when logging in on university computers. Your supervisor can access this folder as well.
  • Install eduVPN and RU connect on your own computer. This allows you to access and work in the workgroup folder when working off campus and using your own computer.
  • Use SURFfilesender to send big files in a safe manner. Enable file encryption in case it is absolutely necessary to share personal data through SURFfilesender. (Again, sharing personal data should be avoided in general. Workgroup folders are the preferred way to share personal data (with, for example, supervisors), when necessary. Only use SURFfilesender for personal data when workgroup folders are not an option at all.).

Graphical user interface, text, applicationDescription automatically generated

  • If you cannot work in the workgroup folder (e.g., you are collecting data in the field and might not have an internet connection), use an encrypted device. Transfer data to a workgroup folder as soon as possible, and delete the data on the encrypted device.
    You can encrypt your computer or USB stick yourself, or have it encrypted for you at the ICT Helpdesk in the University Library. You can find more information here.
  • Limit the amount of time data are stored outside a workgroup folder as much as possible. 

DON’TS

  • Do not ever use Google Drive, WeTransfer, Dropbox etc. to store or send personal data.
  • Avoid using Google Drive, WeTransfer, Dropbox etc. for non-personal data as well, as they are considered less safe.

For more information about safe storage click here (ICT facilities)

[3] This can be done in the account portal. Your supervisor should choose the option “Workgroup folder (with students): folder request”

Archiving data for scientific integrity

Once your research project is completed, it is important to archive your data for the sake of scientific integrity. Archiving your data makes it verifiable for others (e.g., for your supervisor or during audits). These datasets are not made public and shared with other researchers, but are merely accessible to people such as your supervisor.

What should be archived?

You should archive all data that are relevant and necessary for an outsider to be able to reproduce your analysis and conclusions. If you have any doubts about what to archive, you should discuss this with your supervisor.

It is good practice to include documentation with your data. This documentation explains your data and makes sure that your dataset is still understandable in a few years from now. The files that should ideally be included in a dataset are:

  • A readme file: That is a file that provides some context for the dataset. It also lists the data files that you uploaded by name and briefly describes the content of each file. You can find an example of a readme file in the Appendix.
  • All relevant data files: This may include raw data files (including speech and video recordings), transcriptions, annotations, questionnaires, files to run the experiment, a Qualtrics export (e.g., qsf), stimuli, pre-processed data (e.g., cleaned up data), analyzed data (e.g., SPSS or R files), ethical approval etc.
  • A codebook describing the variables in your research as well as a methodology file can be useful additions, too.

As described above, personal data which you did not need for your conclusions should be deleted as soon as possible and should not be archived in order to protect your participants’ privacy.

Archiving in RIS for students

NB: Archiving in RIS for students is not possible for all students yet. Students at Communicatie- en Informatiewetenschappen and International Business Communication at the Faculty of Arts are expected to archive their datasets in RIS for students. You can find a manual on the website. Other students are advised to archive their data in a workgroup folder.

Archiving in a workgroup folder

Alternatively, it is possible to archive your data in a workgroup folder to which your supervisor also has access.

Sharing data publicly for reuse

It is possible to also share your data publicly, for example in the DANS EASY archive. This allows other researchers to reuse your data for their own purposes. Note that sharing data publicly is not standard procedure for student projects and is usually not required. If, however, your research gets published and/or your supervisor and you think that your data could be valuable to others for reuse, there are several options you can explore. Talk to your supervisor about your options.

DO’S

  • Ask your supervisor if they think it’s a good idea to share your data publicly.
  • Ask your supervisor what data to share exactly.
  • Ask your supervisor about an appropriate archive in your field.

DON’TS

  • Do not share personal data publicly.

A dataset that you share for reuse must not contain any personal data (unless this is required for a journal publication and you got specific consent from participants to do so and if your local ethics committee approved this). Thus, a dataset that you share publicly will often be different from the dataset you archived for scientific integrity. For example, you usually have to archive raw data for scientific integrity. However, you are often not allowed to share these data publicly. For example, when collecting audio recordings of participants, you will archive these for scientific integrity. That is possible because no outsiders will have access to these data. However, if you want to share your data publicly for reuse, you are usually not allowed to share these audio recordings, because they are considered personal data.

Appendix: example readme file

Dataset title: Good arguments or a charming narrator? Exploring a text’s persuasiveness through eye-tracking

Student:                            Sanne Huisman (s123456)
First supervisor:               dr. Lisa Begeleider

Second reader:                dr. Ton Lezer

Short summary

This dataset contains all relevant data files for the thesis Good arguments or a charming narrator? Exploring a text’s persuasiveness through eye-tracking, written by Sanne Huisman to obtain the degree of Bachelor of Arts and conclude the bachelor’s programme International Business Communication at Radboud University. This research was conducted at the CLS Lab in the spring of 2019 and supervised by dr. Lisa Begeleider and dr. Ton Lezer.

The goal of this thesis was to explore the persuasiveness of a text as a function of the quality of the presented arguments as well as the likability of the person making these arguments. While persuasiveness is often measured with questionnaires, we explored whether persuasiveness is also reflected in eye movements. A total of 78 participants took part in this study.

Dataset structure

This dataset contains a total of 8 files as well as two zip folders:

  • README.txt:
    That is this very readme file.
  • Rawdata_eyetracking.zip:
    This zip folder contains the raw eye-tracking data of 78 participants. The data was collected using an EyeLink 1000+ eye-tracker. The folder structure of the raw output is unchanged.
  • Experiment.zip:
    This zip folder contains all files necessary to run the experiment using the experiment software Experiment Builder as installed in the CLS Lab.
  • Participant_overview.xlxs:
    This Excel file contains a pseudonymized overview of all participants. It includes the participant id (001, 002 etc), gender and age of each participant.
  • Persuasiveness_questionnaire_empty.doc:
    This file contains an empty version of the questionnaire by McCroskey and Teven (1999) which was used to measure the persuasiveness.
  • Persuasiveness_results.xlsx:
    This file contains the results of all 78 participants on McCroskey and Teven’s (1999) questionnaire.
  • Eyedata_clean.txt:
    This text file contains all the pre-processed and cleaned up eye-tracking data. The variable names are chosen in such a way that they are self-explanatory.
  • data_analysis_anova.sav:
    This file contains all the eye-tracking and questionnaire data as it was analysed in SPSS
  • analysis_anova.sps:
    This SPSS syntax file contains the full analysis.
  • Thesis_HuismanS_2019.pdf:
    This is the thesis which was written based on these data. It includes a detailed methodology section.

References

McCroskey, J. C., & Teven, J. J. (1999). Goodwill: A reexamination of the construct and its measurement. Communications Monographs, 66(1), 90-103. doi: https://doi.org/10.1080/03637759909376464