About dataset sources

 


A dataset contains structured data arranged in records (rows) and fields (columns). You can create datasets by importing Excel spreadsheets or social media data gathered from Facebook and Twitter using NCapture.

In this topic


 


Understand dataset sources

A dataset contains structured data arranged in records (rows) and fields (columns). Datasets are created by importing data, and cannot be edited inside NVivo.

You can create datasets by importing:

  • Spreadsheets

  • NCapture files containing data collected from Facebook and Twitter

When you import NCapture files containing data from Facebook or Twitter, the fields in the dataset are predetermined—based on the type of social media data you are importing.

The table below displays an example of a dataset containing survey responses. Each record (row) represents a single survey respondent. The fields (columns) contain demographic information about the respondent or their responses to the survey questions.

Respondent Age Sex Question 1 response Question 2 response
Anna 29 Female I think there should be more car-free zones. Electric buses and taxis would help reduce pollution in the inner city.
Jack 31 Male Pedestrians need to feel safe. There should be better lighting and more police. We should create more green spaces.

 

If you auto code this dataset, you must select the columns (or values) that you want to use to create your nodes.

Refer to Automatic coding in dataset sources for more detailed information.

Top of Page

Open and navigate dataset sources

You can double-click a dataset in List View to open it in Detail View.

You can view the data as a table—all the records and fields are displayed in a grid:

1 Classifying fields—contain information about your data—for example, the age and sex of survey respondents. Classifying fields have a grey background. Refer to Learn about codable and classifying fields (columns) for more information.

2 Codable fields—contain the information you want to analyze—for example, responses to open-ended survey questions. Codable fields have a white background. Refer to Learn about codable and classifying fields (columns) for more information.

Each row in a dataset has a unique record ID, based on the order in which it is imported. The ID is the first column In the Table pane.

Datasets of more than 100 records are divided into pages. Each page can contain up to 100 records. You can navigate from page to page by using the navigation buttons—you can move to the first, previous, next or last page.

1 Current page

2 Go to first record

3 Go to previous page

4 Go to next page

5 Go to last record

When you click in the Current page box, you can type a page number and then press ENTER to navigate to that page.

To navigate within a page, you can  use scroll bars to move up and down, or left and right

Top of Page

What can I do in a dataset?

When working in a dataset source you can:

  • Code or query the text in codable columns

  • Auto code to organize the data—for example, you could gather the survey responses for each participant in a case node

For more ideas and information on how you can work with datasets, refer to Approaches to analyzing survey results.

Top of Page

Import data from spreadsheets

You can create datasets by importing Excel spreadsheets. You can choose which fields to import and decide which fields contain data that you want to code, and which contain classifying information (for example, demographic characteristics of a survey respondent).

Before you create a dataset by importing data from spreadsheets, you should consider how you want to use the data in NVivo.

You cannot change the data after you have imported it into NVivo, so before import, you should check that:

  • You have collected together all the data you need.

  • You have checked the quality and accuracy of the data.

  • You have considered the analysis type you will set for each field—classifying or codable. Refer to Learn about classifying and codable fields for more information.

  • You have considered what data type you will use for each classifying field—for example, text, date or decimal.

If you have survey responses, and you want to create a case node for each respondent, then the dataset must contain a unique identifier that identifies the responses of each individual. A unique identifier could be the respondent's name, however, in a large survey, names may not be unique. For uniqueness and to protect the identity of your respondents, you may prefer to assign each respondent a unique ID number. You can then gather all responses of an individual respondent to a single node—refer to Approaches to analyzing survey results for more information.

If you have a very large amount of data to import and analyze, it is a good idea to experiment with a subset of the data. If you import a small amount of data, you can experiment with the various approaches to analyzing a dataset.  Once you are confident that you have imported the data in a way that supports your analysis, then you can import all the data, and commence coding in earnest. Make sure you delete the sample dataset that you used for experimental purposes.

Refer to Import data from spreadsheets and text files for more information.

NOTE If your data is in a text file or database table, you can:

  • Import a text file into Excel and then import the resulting spreadsheet file into NVivo.

  • Export a database table as an Excel spreadsheet and then import it into NVivo.

Top of Page

Import social media data

You can use NCapture to collect data from Facebook or Twitter as a dataset.  For example, you can capture data from Twitter and bring the Tweets and profile information about the users into NVivo. The data is saved to an NCapture file, which you can import into NVivo as a dataset.

For more information on importing social media data, refer to:

Top of Page

Import online survey responses

In this release of NVivo for Mac, you cannot import survey responses directly from SurveyMonkey.

However, you can import your SurveyMonkey survey responses using the following method:

  1. In Survey Monkey, export the data as a Excel spreadsheet. Refer to SurveyMonkey Help for more information.

  2. In NVivo for Mac, import the spreadsheet into your NVivo project. Refer to Import data from spreadsheets and text files for more information.

The imported data becomes a dataset source that you can auto code.

If you have a large number of responses to import and analyze, it is a good idea to experiment with a subset of the data. You can import a random sample of responses and then experiment with the various approaches to analyzing the dataset.

Top of Page

Learn about codable and classifying fields

When you import data from spreadsheets, you can choose the 'analysis type' for each field (column)—you can select 'codable' or 'classifying'.

You cannot change the analysis type (codable or classifying) of a field (column) after import, so you should decide how you want to use your data before you create a new dataset.

Fields that contain data that you intend to code and analyze should be stored as codable fields—for example, responses to open-ended survey questions, such as How do you think we can reduce our carbon emissions?

Fields that describe your data (metadata) should be stored as classifying fields—for example, the ID number, Age, Sex and Annual Income of your survey respondents. Values in classifying fields:

  • Provide context when you view coded dataset content in a node.

  • Can be used to build case structures that group your codable content—for example, by Age or Sex.

  • Can be used to create and classify nodes that represent the subjects (cases) of your research. For example, if you create a 'person' node for a survey respondent, you can use the classifying field values Age or Sex as attribute values on the node.

The following table compares codable and classifying fields:

Comparison Codable fields Classifying fields
Type of content Textual content that you want to analyze—for example, survey responses to open-ended questions such as What do you think is the most important environmental issue in your local area? Values that describe the data—for example, in a set of survey responses, you may have classifying columns which contain the name, age or sex of the survey participants.  
 
Scaled responses—for example, your survey might include questions that are answered by choosing a point on a 'strength of agreement scale' containing points ranging from Strongly Disagree to Strongly Agree.

Data types

Text

Text, integer, decimal, date, time, date/time, or boolean

Background color

White

Grey

Edit content No No
Code content Yes No
Use cell values to build case and theme node hierarchies No Yes
Use cell values to populate case classification attribute values No Refer to Setup demographic attributes based on the classifying information in a dataset for more information.

 

Annotate & link No No
Query Yes No
Sort & filter No

Yes

Top of Page

Setup demographic attributes based on the classifying information in a dataset

You can create case nodes by auto coding a dataset. Any classifying information in the dataset can be used as demographic attributes for the case. For more information, refer to Automatic coding in dataset sources.

Top of Page