Methods
Participants . Forty-four participants were recruited from the University of Georgia student body and were compensated with course credit. All participants gave informed consent after reading a description of the study approved by the University of Georgia Human Subjects Institutional Review Board. Data from 2 participants were excluded due to excessive EEG artifacts (described below) leaving 42 participants in all subsequent analyses. The mean age for final sample was 18.9 years (SD = 1) with a range from 18 to 22. Thirty-four of the participants identified as female, 6 of the participants identified as Asian, 2 as Black, 1 as multiracial, and 33 as White.
Video Stimuli Set . The goal in assembling the video set was to represent a wide variety of emotional and neutral content, while holding basic perceptual features of the videos reasonably constant. The videos were clipped to 10 seconds in duration with the intention to allow the viewer to recognize the nature of the situation quickly, with the narrative maintained across the interval. The videos featured a single lens perspective that placed the viewer roughly at eye level, and included a natural soundtrack. The videos excluded recognizable actors or high production value elements associated with professional film (e.g. expert lighting, composition). The intention was to minimize artificial elements that may break the viewer’s belief that the clips depict true events. Videos were selected to depict roughly equivalent degrees of loudness and motion across emotional and neutral videos to avoid a confound of emotion and action.
Forty-five clips meeting these criteria were collected from the internet that depicted a range of pleasant, neutral, and unpleasant situations. These videos were divided into 5 groups of 9 videos each. Four of these video categories depicted emotional situations, judged by the experimenters to be highly arousing (roller coasters, passionate couples, graphic surgery, direct threats) or modestly arousing content (puppies, cute babies; indirect threats). One group of 9 videos depicted active, but common life experiences (walking down a busy street, a kitchen staff hard at work). The video clips were quantified on a number of basic perceptual qualities. This included sound intensity, measured with a BAFX 3370 Digital Sound Level Meter placed at ear level at the participant’s chair. Audio was presented with a Dell A525 3-speaker system with subwoofer, placed directed under the video monitor, and loudness was assessed as decibel (A weighted) values recorded twice per second, and averaged across each video. Brightness was defined by converting the color videos to grayscale and averaging the 0-255 values for each video frame. Movement depicted in the videos was quantified as the average difference in grayscale pixel values between successive frame of each video, using the Magick R package version 2.7.3 (Ooms, 2021). For example, during periods with high levels of movement in a video, many pixels will show large changes in brightness from frame to frame. These changes were averaged to yield a score representing the total movement represented in each video. Lastly, Shannon’s entropy was used as an index of perceptual complexity, by quantifying the entropy value within each frame to provide a mean and standard deviation across each 10 s video. These quantified video features will be correlated with the electrocortical data along with emotional ratings of pleasantness and arousal collected from our participant sample.
Experimental Design and Procedure. After providing informed consent, each participant was given instructions and seated in a chamber shielded for sound and electromagnetic noise. A 64-channel EEG net (described below) was placed and adjusted over the course of 10-15 minutes. The research assistants then reminded the participants to remain still and maintain fixation on a red cross at the center of the video screen throughout the series. Participants were also asked to avoid blinking during each video clip presentation, to the best of their ability.
The video series started with an acclimation trial, which presented a 10 s fixed checkerboard surrounded by a flickering border. A delay of 12 s was then followed by the 45 experimental video clips, which were arranged in a pseudo-randomized order with an average inter-trial interval (ITI) of 12 s (range 10 to 14 s). Videos were ordered such that no more than two videos from the same category were shown in succession, and that video contents were equivalently distributed across the series.
PsychoPy open source software (Peirce et al., 2019) was used to present the video clips and send triggers to the EEG acquisition computer. The PsychoPy control files and video stimuli are available on Open Science Framework (final link TBD). A Dell Optiplex 380 computer presented videos to a 60 Hz Westinghouse 32-in LCD monitor, which was placed 1.6 m from the participants eyes, at a 960 by 648 pixel resolution, with the video clip shown in the central 720 by 405 pixels (20º x 15º visual angle). The remaining monitor space displayed a gray border (RGB value of [148, 148, 148]) around the video which flickered to black (RGB value of [0, 0, 0]) at a 7.5 Hz frequency to evoke a steady-state visual potential. To ensure a precise flash rate, the black border was drawn every eighth screen refresh on the 60 Hz monitor. The presentation code logged the exact time of each flash of the border, which was consistent at 7.5 Hz for all videos and all participants.
Following the presentation of the video series, there was a brief (~2 m) break while research assistants entered the chamber and checked in with the participant and the status of the sensors. The video series was then repeated in a new pseudo-random order. After the second video block, the recording equipment was removed, and participants left the chamber and were given towels and water to clean the electrolyte gel out of their hair. Participants then sat at a computer and viewed the 45 videos again and provided ratings of each video using a computer-based Qualtrics survey. Videos were viewed to completion before participants could give their ratings and advance to the next video. Ratings of experienced valence and arousal were recorded using a computer-based version of the Self-Assessment Manikin (SAM) (Bradley & Lang, 1994). Participants input their ratings of pleasantness and arousal on a scale of 1-9 in units of .1 by clicking and dragging a cursor. After ratings were completed, participants filled out a brief post-experimental questionnaire, and were debriefed on the rationale for the experiment.
EEG Data Collection. The EEG data were recorded using a BioSemi ActiveTwo 64-channel system (BioSemi Amsterdam, Netherlands). The electrode cap was positioned according to the 10-20 system. Data were recorded continuously in reference to common mode electrodes (CMS and DRL). The electrode offsets were kept between 50 and -50 millivolts before data collection. The data were sampled at 512 Hz with no online low- or high-pass filters. Triggers corresponding to video onsets were sent from the PsychoPy presentation computer to the EEG system via a BioSemi USB Trigger Interface cable.
EEG preprocessing. The EEG data were preprocessed using the MATLAB-based Electro Magnetic Encephalography Software (EMEGS; emegs.org; Peyk, De Cesarei, & Junghöfer, 2011). The software implements an artifact correction procedure designed for use with dense array EEG (Junghöfer, Elbert, Tucker, & Rockstroh, 2000). Offline, the EEG data were filtered using a low-pass Butterworth filter with a passband of 30 Hz and stopband of 40 Hz. Because we were primarily interested in our steady-state driven frequency of 7.5 Hz, we used a high-pass filter with a passband of 3 Hz and stopband of 1 Hz. From the continuously recorded data, each trial was segmented from 100 ms prior to 10 s after video onset.
To identify trials and sensors that were contaminated by artifacts, sensor by trial distributions were made by a composite measure from the maximum amplitudes, standard deviation, and the maximum first derivative by EMEGS (Junghöfer et al., 2000). These distributions were jointly used to locate noisy channels as compared to the distribution medians. Trials that contained excess artifact are removed by EMEGS, and sensors that contain excess artifact are replaced by spherical spline interpolation with a weighted average of all remaining sensors. The largest weights for this procedure go to the closest sensors that have the smallest standard deviations. The data were then re-referenced to an average reference and the artifact detection procedure was repeated.
To estimate the elicited ssVEP amplitude, waveforms for each video were created using a custom-built function in MATLAB. The function used a Hilbert transform to isolate the 7.5 Hz phase and derive the instantaneous amplitude at this frequency. This allowed us to transform our data to represent the border-driven 7.5 Hz amplitude over each video presentation. Due to onset / offset transition effects, the first and last second of video duration was excluded from analysis, leaving 8 seconds of ssVEP amplitude.
Following preprocessing, trials were averaged together by category for each of the 42 participants. To be included in the final group, each participant were required to retain at least 50% of the trials from each video category, otherwise the participant was excluded from further analysis, which resulted in the exclusion of 2 participants. Of the remaining sample of 42, 78.1% of trials were retained (standard deviation 9%) across participants. Averaged ssVEP amplitude across the sample was used for the by-video correlation analyses with emotional ratings and video features.
ssVEP data analyses. The amplitude of the ssVEP signal was sampled from 14 occipital sensors in which the ssVEP signal is strongest, based on prior studies using similar designs (Müller et al., 2008; Deweese et al., 2014; Bekhtereva et al., 2017), and confirmed with topographical representation of 7.5 Hz power in the current dataset. These sensors included P1, P3, PO3, PO7, O1, Iz, Oz, POz, Pz, P2, P4, PO4, PO8, & O2. To estimate ssVEP amplitude, the middle 8 seconds of video presentation (from 1 to 9 seconds after onset) was averaged and used for all analyses to avoid onset and offset artifacts. There is considerable inter-subject variability in overall ssVEP amplitude (Moratti et al., 2004; Weiser et al., 2016), thus within-subject data were z-scored across video contents.
Self-reported emotion ratings and ssVEP amplitudes were analyzed across the 5 content categories using repeated measure ANOVAs corrected for violations of sphericity as needed, with effects broken down by paired t-tests, with Tukey correction. Effect sizes were quantified by the generalized eta squared (ηG2 ) measure. Generalized eta can be interpreted such that .01 is a small effect, .06 a medium effect, and .14 a large effect (Olejnik & Algina, 2003). Pearson’s correlations were used to assess the potential relationships between ssVEP amplitude and video estimates of self-reported valence, arousal, luminance, sound intensity, entropy, entropy SD, and pixel motion.