Data Analytics: Computational Temporal Analysis

Cara Marta Messina
Northeastern University
messina [dot] c [at] husky [dot] neu [dot] edu

This notebook takes data collected from Archive of Our Own, a popular fanfiction repository, and sets it up to be analyzed. The data was collected using this AO3 python scraper. The corpus consists of The Legend of Korra and Game of Thrones fanfics, from the first one published on AO3 to 2019.

This notebook is part of the Critical Fan Toolkit, Cara Marta Messina's public + digital dissertation

In [1]:
#pandas for working with dataframes
import pandas as pd

#regular expression library
import re

#numpy specifically works with numbers
import numpy as np

from nltk import word_tokenize

import string
punctuations = list(string.punctuation)

#has the nice counter feature for counting tags
import collections
from collections import Counter 

#for making a string of elements separated by commas into a list
from nltk.tokenize.punkt import PunktSentenceTokenizer, PunktLanguageVars 

#visualizations
import plotly as py
import plotly.express as px
import plotly.graph_objs as go
from plotly.subplots import make_subplots

#calling my plotly thing
import chart_studio
chart_studio.tools.set_credentials_file(username='caramessina', api_key='IdA4LjtaqYKmFJnfS8Uv')

Reading and Prepping the Data

In [2]:
korra_all = pd.read_csv('./data/group_month/allkorra_months.csv').set_index('month')
korra_all.head(3)
Out[2]:
rating additional tags category relationship body count
month
2011-02 not rated, original characters - freeform, multi, mai/zuko (avatar), sokka/suki (avatar), aang (... when kato listens to his father's war stories,... 1
2011-04 general audiences, family, angst, one shot, gen, aang (avatar)/katara, his father shows tenzin where the flowers grow... 1
2011-05 teen and up audiences, completely au, written pre-canon, rated for la... gen, NaN \n \nthe earthbender's answer was not what s... 1
In [3]:
#reading in multiple csv files, since one large one breaks my kernels 

gotmonth0 = pd.read_csv('data/group_month/got_1.csv')
gotmonth1 = pd.read_csv('data/group_month/got_2.csv')
gotmonth2 = pd.read_csv('data/group_month/got_3.csv')
gotmonth3 = pd.read_csv('data/group_month/got_4.csv')

got_all = pd.concat([gotmonth0, gotmonth1, gotmonth2, gotmonth3]).set_index('month')
got_all.head(5)
Out[3]:
rating additional tags category relationship body count
month
2006-08 teen and up audiences, incest, dreams, m/m, jon snow/robb stark, up on the wall it is impossible to be warm and... 1
2007-02 general audiences, possible incest, NaN NaN the last time jon had seen his half-sister san... 1
2007-05 teen and up audiences, tragedy, canonical character death, suicide, b... f/m, rhaegar targaryen/lyanna stark, robert barathe... it is far too easy, to slip away. lyanna is kn... 1
2007-06 mature, teen and up audiences, alternate universe, infidelity, unrequited lov... f/m, f/m, cersei lannister/oberyn martell, petyr baelish... the vase shatters beautifully against the wall... 2
2007-12 teen and up audiences, general audiences, romance, action/adventure, incest, maleslash, f/m, m/m, brienne/jaime lannister, jaime lannister/rhaeg... asshai is the end of the world. the valaryians... 2

Couting Metadata

created a function that will take the different tags (which are phrased as characterA/characterB, characterA/characterB, etc in the data) and count the most common relationships to then output it as the most common relationship tags used.

In [4]:
def column_to_list(df,columnName):
    '''
    this function takes all the information from a specific column, joins it to a string, and then tokenizes & cleans that string.
    input: the name of the dataframe and the column name
    output: the tokenized list of the text with all lower case, punctuation removed, and no stop words
    '''
    df[columnName] = df[columnName].replace(np.nan,'',regex=True) 
    string = ' '.join(df[columnName].tolist())
    return string
In [5]:
def clean_tokens(string):    
    stopwords = ['i', 'me', 'my', 'myself', "“", "”", 'we', 'our', '’', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", "would", "could", 'won', "won't", 'wouldn', "wouldn't"]
    text_lc = [word.lower() for word in string]
    text_tokens_clean = [word for word in text_lc if word not in stopwords]
    text_tokens_clean = [word for word in text_tokens_clean if word not in punctuations]
    return text_tokens_clean
    print(text_tokens_clean[:20])
In [6]:
def TagsAnalyzer(df, monthBegin, monthEnd, columnName):
    '''
    input: the index month+year, such as '2012-04', and the specific metadata, such as 'additional tags'
    output: a tupple of the count of tags in a specific month/year
    
    load in the proper data into a string'''
    
    #choose the months to analyze
    months_df = df.loc[monthBegin:monthEnd, :]
    
    #replace empty values & make a list of all the words
    string = column_to_list(months_df, columnName)
    
    #the function to tokenize, or put each value as an element in a list
    class CommaPoint(PunktLanguageVars):
        sent_end_chars = (',') 
    tokenizer = PunktSentenceTokenizer(lang_vars = CommaPoint())
    
    #tokenizing the list of strings based on the COMMA, not the white space (as seen in the CommaPoint above)
    ListOfTags = tokenizer.tokenize(string)
    
    #the "Counter" function is from the collections library
    allCounter=collections.Counter(ListOfTags)
    
    return allCounter.most_common(50)

Korra Relationship Tags

In [7]:
korra_preKArel = TagsAnalyzer(korra_all,'2011-02','2014-07','relationship')
korra_subKArel = TagsAnalyzer(korra_all,'2014-02','2014-11','relationship')
korra_postKArel = TagsAnalyzer(korra_all,'2014-12','2015-07','relationship')

print('Pre-Korrasami')
print(korra_preKArel)
print('\n Korrasami Subtext')
print(korra_subKArel)
print('\n Post-Korrasami')
print(korra_postKArel)
/Users/caramessina/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:7: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Pre-Korrasami
[('korra/mako (avatar),', 328), ('korra/asami sato,', 119), ('bolin/korra (avatar),', 94), ('amon/lieutenant (avatar),', 69), ('amon/korra (avatar),', 50), ('korra/tahno (avatar),', 48), ('lin bei fong/tenzin,', 42), ('tahno/korra,', 40), ('pema/tenzin (avatar),', 39), ('korra/tarrlok (avatar),', 38), ('aang/katara (avatar),', 35), ('mako/asami sato,', 34), ('korra/asami,', 32), ('bolin/iroh ii,', 31), ('iroh ii/bolin,', 28), ('bolin/asami sato,', 26), ('bolin & mako (avatar),', 22), ('broh - relationship,', 20), ('mako/asami,', 19), ('korrasami,', 19), ('korra/tahno,', 17), ('amon/korra/tarrlok,', 17), ('amorra - relationship,', 16), ('bolin/korra/mako (avatar),', 15), ('tahorra,', 14), ('toph bei fong/sokka (avatar),', 14), ('mai/zuko (avatar),', 12), ('amon/korra,', 11), ('lin/tenzin,', 10), ('asami/korra,', 10), ('tarrlok/korra,', 10), ('bolin/eska (avatar),', 10), ('sokka/suki (avatar),', 9), ('katara/zuko (avatar),', 9), ('korra/lin beifong,', 9), ('iroh/bolin,', 9), ('bolin/iroh,', 9), ('bolin/mako,', 8), ('bolin/jinora,', 8), ('bolin/korra,', 8), ('bolin/iroh ii (avatar),', 8), ('amon | noatak/korra,', 8), ('korrlok,', 8), ('senna/tonraq (avatar),', 8), ('tahnorra - relationship,', 7), ('asami/mako,', 7), ('asami sato/korra,', 7), ('noatak & tarrlok,', 7), ('amorralok,', 7), ('honumi,', 7)]

 Korrasami Subtext
[('korra/asami sato,', 297), ('korra/mako (avatar),', 103), ('amon/korra (avatar),', 20), ('mako/asami sato,', 19), ('varrick/zhu li,', 19), ('bolin/korra (avatar),', 18), ('lin bei fong/tenzin,', 18), ('mako/prince wu,', 18), ('aang/katara (avatar),', 16), ('korra/tahno (avatar),', 13), ('korrasami,', 12), ('amon | noatak/korra,', 12), ('pema/tenzin (avatar),', 9), ('amon/lieutenant (avatar),', 9), ('mako/wu (avatar),', 9), ('bolin/iroh ii (avatar),', 8), ('bolin & mako (avatar),', 7), ('lin bei fong/kya ii,', 7), ('wuko - relationship,', 7), ('toph bei fong/sokka (avatar),', 6), ('lin bei fong/korra,', 6), ('suyin beifong/kuvira,', 6), ('korra & asami sato,', 5), ('korra/kuvira,', 5), ('katara/zuko (avatar),', 4), ('broh - relationship,', 4), ('korra/mako/asami sato,', 4), ('jinora/kai,', 4), ("ming-hua/p'li,", 4), ('bolin/opal,', 4), ('senna/tonraq (avatar),', 3), ('mai/zuko (avatar),', 3), ('bolin/asami sato,', 3), ('mako/tahno (avatar),', 3), ('iroh ii/asami sato,', 3), ('raava/wan,', 3), ('tokka,', 3), ("zaheer/p'li,", 3), ("p'li/zaheer,", 3), ('james "bucky" barnes/steve rogers,', 2), ('marco bott/jean kirstein,', 2), ('lin bei fong & toph bei fong,', 2), ('bolin/eska,', 2), ('derek hale/stiles stilinski,', 2), ('sokka/suki (avatar),', 2), ('sokka/yue (avatar),', 2), ('korra/tarrlok (avatar),', 2), ('bolin/kai,', 2), ('kainora,', 2), ('ming/shaozu/tahno (avatar),', 2)]

 Post-Korrasami
[('korra/asami sato,', 1250), ('bolin/opal (avatar),', 120), ('korra/mako (avatar),', 96), ('baatar jr./kuvira (avatar),', 94), ('korra/kuvira,', 45), ('korrasami,', 44), ('jinora/kai (avatar),', 41), ('lin beifong/kya ii,', 40), ('mako/prince wu (avatar),', 33), ('lin beifong/tenzin,', 33), ('varrick/zhu li moon,', 32), ('mako/asami sato,', 27), ('bolin/opal,', 26), ('aang/katara (avatar),', 24), ('varrick/zhu li,', 22), ('senna/tonraq (avatar),', 20), ('korra & asami sato,', 19), ('korra/kuvira (avatar),', 18), ('suyin beifong/kuvira,', 17), ('korvira - relationship,', 16), ('pema/tenzin (avatar),', 16), ('bolin/iroh ii (avatar),', 13), ('lin bei fong & toph bei fong,', 13), ('korra & kuvira (avatar),', 13), ('jinora/kai,', 12), ('makorra - relationship,', 11), ("p'li/zaheer (avatar),", 11), ('lin bei fong/tenzin,', 10), ('korra & mako (avatar),', 10), ('amon/lieutenant (avatar),', 10), ('ghazan/ming-hua (avatar),', 10), ('azula/ty lee (avatar),', 9), ('kuvorra,', 9), ('bolin/korra (avatar),', 9), ('ambiguous or implied relationship(s),', 9), ('lin bei fong/kya ii,', 8), ('kuvira/korra,', 8), ('lin beifong/korra,', 8), ('mako & asami sato,', 8), ('baatar jr./kuvira,', 7), ('sokka/suki (avatar),', 7), ('korra/tahno (avatar),', 7), ('mako (avatar)/original female character,', 7), ('kovira,', 7), ('kuvira/asami sato,', 7), ('lin beifong/kya,', 6), ('weilin,', 6), ('mako/prince wu,', 6), ('amon/korra (avatar),', 6), ('bolin & asami sato,', 6)]
In [8]:
korra_preKA_at = TagsAnalyzer(korra_all,'2011-02','2014-07','additional tags')
korra_subKA_at = TagsAnalyzer(korra_all,'2014-02','2014-11','additional tags')
korra_postKA_at = TagsAnalyzer(korra_all,'2014-12','2015-07','additional tags')

print('Pre-Korrasami')
print(korra_preKA_at)
print('\n Korrasami Subtext')
print(korra_subKA_at)
print('\n Post-Korrasami')
print(korra_postKA_at)
/Users/caramessina/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:7: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Pre-Korrasami
[('romance,', 154), ('angst,', 115), ('friendship,', 94), ('fluff,', 75), ('family,', 72), ('alternate universe,', 69), ('established relationship,', 64), ('smut,', 49), ('au,', 49), ('hurt/comfort,', 46), ('humor,', 42), ('drabble,', 40), ('alternate universe - canon divergence,', 40), ('drama,', 37), ('explicit sexual content,', 31), ('developing relationship,', 28), ('sexual content,', 28), ('oral sex,', 27), ('first time,', 26), ('crossover,', 22), ('one shot,', 20), ('canon compliant,', 20), ('character study,', 19), ('mother-daughter relationship,', 18), ('makorra,', 18), ('crack,', 18), ('brothers,', 17), ('femslash,', 17), ('spoilers,', 16), ('break up,', 16), ('broh week,', 16), ('threesome - f/m/m,', 15), ('love,', 14), ('canon character of color,', 14), ('alternate universe - fusion,', 14), ('unrequited love,', 14), ('first kiss,', 14), ("tahno's love triad,", 14), ('fraternal polyandry,', 14), ('pre-canon,', 13), ('alternate universe - modern setting,', 13), ('threesome,', 13), ('prompt fic,', 13), ('pregnancy,', 13), ('canonical character death,', 13), ('character death,', 12), ('sibling incest,', 12), ('action/adventure,', 11), ('women being awesome,', 11), ('ot3,', 11)]

 Korrasami Subtext
[('fluff,', 63), ('romance,', 62), ('angst,', 48), ('alternate universe,', 42), ('alternate universe - modern setting,', 32), ('friendship,', 29), ('smut,', 25), ('hurt/comfort,', 23), ('au,', 20), ('humor,', 19), ('korrasami - freeform,', 16), ('first kiss,', 15), ('korrasami week,', 14), ('established relationship,', 13), ('drabble,', 13), ('korrasami week 2014,', 13), ('femslash,', 12), ('family,', 12), ('friends to lovers,', 11), ('depression,', 11), ('alternate universe - canon divergence,', 10), ('crossover,', 9), ('character death,', 9), ('alternate universe - college/university,', 9), ('amorra - freeform,', 9), ('sexual content,', 8), ('first meetings,', 8), ('cross-posted on fanfiction.net,', 8), ('first time,', 7), ('emotional hurt/comfort,', 7), ('spirit world,', 7), ('fluff and angst,', 7), ('unresolved sexual tension,', 6), ('canon compliant,', 6), ('alternate universe - high school,', 6), ('grief/mourning,', 6), ('oral sex,', 6), ('legend of korra - freeform,', 6), ('meta,', 6), ('drama,', 6), ('action/adventure,', 5), ('drabble collection,', 5), ('modern au,', 5), ('nsfw,', 5), ('alternate universe - rock band,', 5), ('love,', 5), ('anal sex,', 5), ('bending (avatar),', 5), ('one shot,', 5), ('minor character death,', 5)]

 Post-Korrasami
[('fluff,', 229), ('romance,', 213), ('angst,', 145), ('alternate universe - modern setting,', 126), ('korrasami - freeform,', 126), ('alternate universe,', 89), ('canon compliant,', 81), ('friendship,', 74), ('post-finale,', 62), ('post-canon,', 56), ('humor,', 56), ('alternate universe - canon divergence,', 54), ('family,', 47), ('femslash,', 46), ('hurt/comfort,', 45), ('post-series,', 45), ('cross-posted on fanfiction.net,', 44), ('smut,', 44), ('alternate universe - college/university,', 42), ('fluff and angst,', 41), ('drama,', 35), ('au,', 32), ('one shot,', 31), ('spirit world,', 27), ('action/adventure,', 26), ('emotional hurt/comfort,', 25), ('violence,', 25), ('friends to lovers,', 24), ('love,', 24), ('drabble,', 24), ('canon queer relationship,', 23), ('alternate universe - high school,', 22), ('alternate universe - no bending,', 22), ('canon lesbian relationship,', 21), ('established relationship,', 21), ('crossover,', 21), ('canon bisexual character,', 20), ('post-traumatic stress disorder - ptsd,', 19), ('first kiss,', 19), ('friendship/love,', 18), ('slow burn,', 18), ('kissing,', 17), ('anal sex,', 17), ('eventual romance,', 17), ('original character(s),', 16), ('character death,', 16), ('alpha/beta/omega dynamics,', 16), ('legend of korra - freeform,', 16), ('baavira - freeform,', 16), ('oral sex,', 15)]
In [9]:
korra_preKAcat = TagsAnalyzer(korra_all,'2011-02','2014-07','category')
korra_subKAcat = TagsAnalyzer(korra_all,'2014-02','2014-11','category')
korra_postKAcat = TagsAnalyzer(korra_all,'2014-12','2015-07','category')

print('Pre-Korrasami')
print(korra_preKAcat)
print('\n Korrasami Subtext')
print(korra_subKAcat)
print('\n Post-Korrasami')
print(korra_postKAcat)
/Users/caramessina/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:7: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Pre-Korrasami
[('f/m,', 863), ('gen,', 661), ('m/m,', 249), ('f/f,', 238), ('multi,', 94), ('other,', 33)]

 Korrasami Subtext
[('f/f,', 350), ('f/m,', 252), ('gen,', 220), ('m/m,', 68), ('multi,', 25), ('other,', 14)]

 Post-Korrasami
[('f/f,', 1351), ('f/m,', 490), ('gen,', 346), ('m/m,', 120), ('multi,', 83), ('other,', 40)]

Game of Thrones Metadata

Relationship Tags

In [10]:
#seasons 1 and 2 – season 3 starts March 2013
got1_2relationship = TagsAnalyzer(got_all,'2006-08','2013-02','relationship')
print('\n Seasons 1 and 2')
print(got1_2relationship)

#seasons 3 and 4 – season 5 starts April 2015
got3_4relationship = TagsAnalyzer(got_all,'2013-03','2015-03','relationship')
print('\n Seasons 3 and 4')
print(got3_4relationship)

#seasons 5 and 6 – season 7 starts July 2017
got5_6relationship = TagsAnalyzer(got_all,'2015-07','2017-06','relationship')
print('\n Seasons 5 and 6')
print(got5_6relationship)

#season 7 – seasons 8 starts April 2019
got7relationship = TagsAnalyzer(got_all,'2017-07','2019-03','relationship')
print('\n Season 7')
print(got7relationship)

#season 7 – seasons 8 starts April 2019
got8relationship = TagsAnalyzer(got_all,'2019-04','2019-09','relationship')
print('\n Season 8')
print(got8relationship)
/Users/caramessina/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:7: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

 Seasons 1 and 2
[('jaime lannister/brienne of tarth,', 147), ('arya stark/gendry waters,', 100), ('sandor clegane/sansa stark,', 76), ('cersei lannister/jaime lannister,', 71), ('theon greyjoy/robb stark,', 68), ('renly baratheon/loras tyrell,', 67), ('jon snow/robb stark,', 61), ('catelyn stark/ned stark,', 60), ('gendry/arya stark,', 49), ('jon snow/sansa stark,', 29), ('jorah mormont/daenerys targaryen,', 28), ('robb stark/jeyne westerling,', 25), ('sansa stark/willas tyrell,', 22), ('lyanna stark/rhaegar targaryen,', 21), ('joffrey baratheon/sansa stark,', 21), ('jon snow/daenerys targaryen,', 20), ('jon snow/ygritte,', 20), ('myrcella baratheon/robb stark,', 19), ('jaime lannister/sansa stark,', 19), ('petyr baelish/sansa stark,', 16), ('robert baratheon/cersei lannister,', 15), ('stannis baratheon/davos seaworth,', 15), ('theon greyjoy/jon snow,', 14), ('jon snow/arya stark,', 14), ('sansa stark/margaery tyrell,', 12), ("jaqen h'ghar/arya stark,", 12), ('cersei lannister/ned stark,', 11), ('tyrion lannister/sansa stark,', 11), ('arya stark/aegon vi targaryen,', 11), ('jon snow/val,', 11), ('robert baratheon/lyanna stark,', 9), ('daenerys targaryen/doreah (asoiaf),', 9), ('khal drogo/daenerys targaryen,', 9), ('theon greyjoy/sansa stark,', 9), ('tywin lannister/arya stark,', 9), ('brandon stark/catelyn stark,', 8), ('dacey mormont/robb stark,', 8), ('jon snow/samwell tarly,', 8), ('ashara dayne/ned stark,', 8), ('shireen baratheon/rickon stark,', 8), ('meera reed/bran stark,', 8), ('cersei lannister/sansa stark,', 7), ('asha greyjoy/theon greyjoy,', 7), ('robb stark/margaery tyrell,', 7), ('stannis baratheon/sansa stark,', 7), ('erik lehnsherr/charles xavier,', 6), ('elia martell/rhaegar targaryen,', 6), ('cersei lannister/lancel lannister,', 6), ('petyr baelish/catelyn stark,', 5), ('robb stark/daenerys targaryen,', 5)]

 Seasons 3 and 4
[('jaime lannister/brienne of tarth,', 541), ('arya stark/gendry waters,', 520), ('sandor clegane/sansa stark,', 433), ('catelyn stark/ned stark,', 263), ('sansa stark/margaery tyrell,', 217), ('cersei lannister/jaime lannister,', 203), ('theon greyjoy/robb stark,', 186), ('renly baratheon/loras tyrell,', 175), ('jon snow/sansa stark,', 161), ('petyr baelish/sansa stark,', 156), ('ramsay bolton/theon greyjoy,', 150), ('sandor clegane & sansa stark,', 142), ('jon snow/ygritte,', 132), ('jon snow/robb stark,', 126), ('lyanna stark/rhaegar targaryen,', 110), ('joffrey baratheon/sansa stark,', 94), ('robb stark/jeyne westerling,', 88), ('tyrion lannister/sansa stark,', 84), ('ramsay bolton/reek,', 84), ('sansa stark/willas tyrell,', 70), ('robert baratheon/cersei lannister,', 65), ('jojen reed/bran stark,', 64), ('myrcella baratheon/robb stark,', 63), ('elia martell/rhaegar targaryen,', 60), ('jon snow/daenerys targaryen,', 52), ('robb stark/sansa stark,', 52), ('stannis baratheon/davos seaworth,', 52), ('shireen baratheon/rickon stark,', 51), ('theon greyjoy/jon snow,', 50), ('robert baratheon/lyanna stark,', 49), ('khal drogo/daenerys targaryen,', 48), ('robb stark/margaery tyrell,', 46), ('oberyn martell/ellaria sand,', 46), ("jaqen h'ghar/arya stark,", 45), ('joffrey baratheon/margaery tyrell,', 44), ('ramsay bolton/theon greyjoy/reek,', 44), ('arya stark/aegon vi targaryen,', 41), ('meera reed/bran stark,', 38), ('jaime lannister/sansa stark,', 37), ('jorah mormont/daenerys targaryen,', 34), ('jon snow/arya stark,', 33), ('arya stark & sansa stark,', 33), ('derek hale/stiles stilinski,', 31), ('theon greyjoy/sansa stark,', 29), ('melisandre of asshai/stannis baratheon,', 28), ('talisa maegyr/robb stark,', 28), ('gilly (asoiaf)/samwell tarly,', 27), ('tywin lannister/sansa stark,', 26), ('ashara dayne/ned stark,', 25), ('theon greyjoy/jeyne poole,', 25)]

 Seasons 5 and 6
[('jon snow/sansa stark,', 1015), ('jaime lannister/brienne of tarth,', 663), ('sandor clegane/sansa stark,', 481), ('arya stark/gendry waters,', 366), ('petyr baelish/sansa stark,', 269), ('catelyn stark/ned stark,', 259), ('sansa stark/margaery tyrell,', 248), ('cersei lannister/jaime lannister,', 173), ('theon greyjoy/robb stark,', 169), ('jon snow/ygritte,', 166), ('lyanna stark/rhaegar targaryen,', 157), ('renly baratheon/loras tyrell,', 152), ('sandor clegane & sansa stark,', 147), ('jon snow/daenerys targaryen,', 144), ('ramsay bolton/theon greyjoy,', 126), ('jon snow/robb stark,', 122), ('minor or background relationship(s),', 102), ('jon snow/arya stark,', 94), ('stannis baratheon/sansa stark,', 89), ('joffrey baratheon/sansa stark,', 82), ('shireen baratheon/rickon stark,', 76), ('jojen reed/bran stark,', 73), ('tyrion lannister/sansa stark,', 72), ('jon snow & sansa stark,', 70), ('elia martell/rhaegar targaryen,', 66), ('robb stark/margaery tyrell,', 65), ('meera reed/bran stark,', 63), ('jon snow & arya stark,', 61), ('robb stark/sansa stark,', 61), ('arya stark & sansa stark,', 60), ('robert baratheon/cersei lannister,', 59), ('khal drogo/daenerys targaryen,', 58), ('tormund giantsbane/brienne of tarth,', 58), ('ramsay bolton/reek,', 52), ('ramsay bolton/sansa stark,', 51), ('theon greyjoy/sansa stark,', 51), ('gilly (asoiaf)/samwell tarly,', 49), ('oberyn martell/ellaria sand,', 49), ('myrcella baratheon/robb stark,', 48), ('robb stark/jeyne westerling,', 48), ('sansa stark/willas tyrell,', 46), ('other relationship tags to be added,', 43), ('arya stark & gendry waters,', 42), ('stannis baratheon/davos seaworth,', 42), ('tyrion lannister/shae,', 40), ('jon snow & robb stark,', 39), ('jorah mormont/daenerys targaryen,', 38), ('robert baratheon/lyanna stark,', 38), ('talisa maegyr/robb stark,', 37), ("jaqen h'ghar/arya stark,", 37)]

 Season 7
[('jon snow/sansa stark,', 2196), ('jon snow/daenerys targaryen,', 1288), ('jaime lannister/brienne of tarth,', 879), ('arya stark/gendry waters,', 690), ('sandor clegane/sansa stark,', 472), ('catelyn stark/ned stark,', 396), ('petyr baelish/sansa stark,', 292), ('cersei lannister/jaime lannister,', 263), ('lyanna stark/rhaegar targaryen,', 253), ('sandor clegane & sansa stark,', 219), ('theon greyjoy/robb stark,', 196), ('minor or background relationship(s),', 184), ('sansa stark/margaery tyrell,', 176), ('robb stark/margaery tyrell,', 161), ('jon snow/ygritte,', 152), ('jon snow & sansa stark,', 148), ('arya stark & sansa stark,', 143), ('jon snow & arya stark,', 142), ('tyrion lannister/sansa stark,', 132), ('jaime lannister/sansa stark,', 130), ('jon snow/arya stark,', 126), ('jon snow & daenerys targaryen,', 112), ('renly baratheon/loras tyrell,', 111), ('sansa stark/daenerys targaryen,', 100), ('theon greyjoy/sansa stark,', 99), ('ashara dayne/ned stark,', 97), ('robert baratheon/cersei lannister,', 94), ('theon greyjoy/jon snow,', 93), ('meera reed/bran stark,', 91), ('elia martell/rhaegar targaryen,', 88), ('gilly (asoiaf)/samwell tarly,', 84), ('jon snow/robb stark,', 84), ('ramsay bolton/theon greyjoy,', 84), ('tormund giantsbane/brienne of tarth,', 81), ('khal drogo/daenerys targaryen,', 77), ('arya stark & gendry waters,', 76), ('grey worm/missandei,', 72), ('other relationship tags to be added,', 70), ('joffrey baratheon/sansa stark,', 67), ('robb stark/original female character(s),', 63), ('talisa maegyr/robb stark,', 62), ('jon snow & robb stark,', 58), ('oberyn martell/ellaria sand,', 58), ('jorah mormont/daenerys targaryen,', 55), ('robb stark/jeyne westerling,', 55), ('ramsay bolton/reader,', 55), ('myrcella baratheon/robb stark,', 54), ('jaime lannister & brienne of tarth,', 54), ('jon snow/original female character(s),', 53), ('ramsay bolton/sansa stark,', 52)]

 Season 8
[('jaime lannister/brienne of tarth,', 1343), ('arya stark/gendry waters,', 1337), ('jon snow/daenerys targaryen,', 841), ('jon snow/sansa stark,', 709), ('theon greyjoy/sansa stark,', 304), ('sandor clegane/sansa stark,', 250), ('cersei lannister/jaime lannister,', 212), ('catelyn stark/ned stark,', 190), ('tormund giantsbane/jon snow,', 184), ('minor or background relationship(s),', 183), ('tyrion lannister/sansa stark,', 169), ('arya stark & sansa stark,', 160), ('arya stark & gendry waters,', 146), ('sansa stark/margaery tyrell,', 141), ('jon snow & arya stark,', 118), ('jon snow/ygritte,', 110), ('theon greyjoy/robb stark,', 108), ('grey worm/missandei,', 107), ('jorah mormont/daenerys targaryen,', 101), ('sansa stark/daenerys targaryen,', 95), ('lyanna stark/rhaegar targaryen,', 90), ('jon snow & daenerys targaryen,', 89), ('jon snow & sansa stark,', 88), ('robb stark/margaery tyrell,', 87), ('jaime lannister & brienne of tarth,', 86), ('jaime lannister/sansa stark,', 74), ('sandor clegane & sansa stark,', 72), ('petyr baelish/sansa stark,', 69), ('meera reed/bran stark,', 67), ('podrick payne/sansa stark,', 59), ('sandor clegane & arya stark,', 59), ('yara greyjoy/daenerys targaryen,', 57), ('talisa maegyr/robb stark,', 53), ('jaime lannister & tyrion lannister,', 50), ('jon snow/arya stark,', 49), ('renly baratheon/loras tyrell,', 49), ('theon greyjoy & sansa stark,', 48), ('gilly (asoiaf)/samwell tarly,', 47), ('robert baratheon/cersei lannister,', 46), ('tyrion lannister & sansa stark,', 44), ('arya stark & bran stark,', 44), ('ramsay bolton/theon greyjoy,', 43), ('sansa stark & brienne of tarth,', 41), ('jon snow/robb stark,', 39), ('joffrey baratheon/sansa stark,', 39), ('jon snow/original female character(s),', 38), ('podrick payne & brienne of tarth,', 38), ('tormund giantsbane/brienne of tarth,', 36), ('elia martell/rhaegar targaryen,', 35), ('jon snow/rhaenys targaryen,', 34)]

Additional Tags

In [11]:
#seasons 1 and 2 – season 3 starts March 2013
got1_2AT = TagsAnalyzer(got_all,'2006-08','2013-02','additional tags')
print('\n Seasons 1 and 2')
print(got1_2AT)

#seasons 3 and 4 – season 5 starts April 2015
got3_4AT = TagsAnalyzer(got_all,'2013-03','2015-03','additional tags')
print('\n Seasons 3 and 4')
print(got3_4AT)

#seasons 5 and 6 – season 7 starts July 2017
got5_6AT = TagsAnalyzer(got_all,'2015-07','2017-06','additional tags')
print('\n Seasons 5 and 6')
print(got5_6AT)

#season 7 – seasons 8 starts April 2019
got7AT = TagsAnalyzer(got_all,'2017-07','2019-03','additional tags')
print('\n Season 7')
print(got7AT)

#season 7 – seasons 8 starts April 2019
got8AT = TagsAnalyzer(got_all,'2019-04','2019-09','additional tags')
print('\n Season 8')
print(got8AT)
/Users/caramessina/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:7: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

 Seasons 1 and 2
[('no english,', 108), ('angst,', 96), ('alternate universe - modern setting,', 95), ('alternate universe,', 91), ('somali,', 85), ('romance,', 67), ('sibling incest,', 56), ('alternate universe - canon divergence,', 50), ('sexual content,', 50), ('fluff,', 47), ('future fic,', 44), ('first time,', 36), ('incest,', 31), ('family,', 31), ('crossover,', 30), ('friendship,', 30), ('alternate universe - canon,', 29), ('oral sex,', 27), ('hurt/comfort,', 26), ('drabble,', 26), ('cunnilingus,', 26), ('somali only,', 24), ('half-sibling incest,', 24), ('au,', 23), ('dubious consent,', 22), ('pre-canon,', 20), ('pov female character,', 20), ('blow jobs,', 18), ('slash,', 18), ('arranged marriage,', 18), ('masturbation,', 16), ('unrequited love,', 15), ('character study,', 15), ('drama,', 15), ('canon compliant,', 15), ('canonical character death,', 14), ('first kiss,', 14), ('kink meme,', 14), ('personal,', 14), ('love,', 14), ('dirty talk,', 14), ('prompt fic,', 14), ('unresolved sexual tension,', 13), ('smut,', 13), ('personal use,', 13), ('hand jobs,', 13), ("don't copy to another site,", 13), ('fluff and angst,', 13), ('action/adventure,', 12), ('pregnancy,', 12)]

 Seasons 3 and 4
[('alternate universe - modern setting,', 980), ('alternate universe - canon divergence,', 397), ('fluff,', 356), ('angst,', 354), ('alternate universe,', 268), ('romance,', 220), ('sexual content,', 146), ('oral sex,', 141), ('sibling incest,', 119), ('au,', 114), ('smut,', 114), ('first time,', 102), ('hurt/comfort,', 102), ('future fic,', 101), ('violence,', 98), ('one shot,', 95), ('explicit sexual content,', 94), ('fluff and angst,', 92), ('alternate universe - college/university,', 91), ('humor,', 89), ('anal sex,', 89), ('incest,', 85), ('alternate universe - high school,', 85), ('masturbation,', 81), ('character death,', 79), ('family,', 79), ('crossover,', 77), ('friendship,', 75), ('arranged marriage,', 70), ('plot what plot/porn without plot,', 70), ('dubious consent,', 70), ('asoiaf kink meme,', 70), ('crack,', 67), ('drabble,', 66), ('modern au,', 64), ('r plus l equals j,', 63), ('older man/younger woman,', 62), ('fluff and smut,', 62), ('torture,', 61), ('slow burn,', 59), ('drama,', 58), ('ramsay is his own warning,', 55), ('post - a dance with dragons,', 55), ('half-sibling incest,', 54), ('canon-typical violence,', 53), ('sansan,', 51), ('love,', 51), ('first kiss,', 50), ('unresolved sexual tension,', 50), ('house stark,', 50)]

 Seasons 5 and 6
[('alternate universe - modern setting,', 1230), ('fluff,', 659), ('alternate universe - canon divergence,', 597), ('angst,', 578), ('romance,', 384), ('smut,', 363), ('alternate universe,', 280), ('r plus l equals j,', 246), ('explicit sexual content,', 220), ('hurt/comfort,', 208), ('slow burn,', 192), ('oral sex,', 190), ('fluff and angst,', 180), ('one shot,', 162), ('sexual content,', 151), ('arranged marriage,', 141), ('love,', 137), ('violence,', 133), ('family,', 132), ('fluff and smut,', 127), ('other additional tags to be added,', 122), ('ramsay is his own warning,', 120), ('humor,', 120), ('canon-typical violence,', 119), ('anal sex,', 119), ('sibling incest,', 116), ('modern au,', 113), ('drama,', 105), ('drabble,', 104), ('plot what plot/porn without plot,', 103), ('older man/younger woman,', 103), ('first time,', 100), ('cunnilingus,', 97), ('friends to lovers,', 97), ('jon snow is a targaryen,', 96), ('character death,', 95), ('cousin incest,', 94), ('au,', 94), ('alternate universe - college/university,', 92), ('emotional hurt/comfort,', 92), ('implied/referenced rape/non-con,', 91), ('incest,', 89), ('falling in love,', 87), ('explicit language,', 85), ('canon compliant,', 83), ('friendship,', 83), ('dubious consent,', 82), ('sex,', 82), ('jealousy,', 79), ('established relationship,', 78)]

 Season 7
[('alternate universe - modern setting,', 1833), ('alternate universe - canon divergence,', 1112), ('fluff,', 1070), ('angst,', 895), ('smut,', 682), ('romance,', 582), ('alternate universe,', 449), ('slow burn,', 446), ('r plus l equals j,', 412), ('explicit sexual content,', 360), ('hurt/comfort,', 320), ('fluff and angst,', 283), ('oral sex,', 276), ('jon snow is a targaryen,', 273), ('arranged marriage,', 268), ('fluff and smut,', 256), ('cunnilingus,', 243), ('one shot,', 240), ('incest,', 237), ('canon compliant,', 223), ('sibling incest,', 208), ('modern au,', 204), ('eventual smut,', 199), ('post-canon,', 199), ('other additional tags to be added,', 197), ('plot what plot/porn without plot,', 196), ('angst with a happy ending,', 195), ('canon-typical violence,', 193), ('drabble,', 191), ('love,', 190), ('established relationship,', 186), ('jonsa,', 174), ('anal sex,', 171), ('humor,', 170), ('rough sex,', 170), ('pregnancy,', 167), ('jealousy,', 165), ('violence,', 157), ('falling in love,', 157), ('cousin incest,', 155), ('jonerys,', 152), ('emotional hurt/comfort,', 150), ('ramsay is his own warning,', 149), ('friends to lovers,', 149), ('family,', 145), ('sex,', 143), ('sexual content,', 142), ('dragons,', 141), ('mutual pining,', 141), ('drama,', 138)]

 Season 8
[('alternate universe - modern setting,', 980), ('alternate universe - canon divergence,', 866), ('angst,', 767), ('fluff,', 766), ('romance,', 420), ('fix-it,', 416), ('smut,', 404), ('slow burn,', 358), ('hurt/comfort,', 352), ('alternate universe,', 287), ('canon compliant,', 248), ('fluff and angst,', 243), ('angst with a happy ending,', 200), ('post-canon,', 196), ('explicit sexual content,', 190), ('fluff and smut,', 180), ('oral sex,', 172), ('one shot,', 167), ('jon snow is a targaryen,', 158), ('r plus l equals j,', 154), ('emotional hurt/comfort,', 152), ('happy ending,', 152), ('season/series 08,', 152), ('other additional tags to be added,', 147), ('canon-typical violence,', 141), ('love,', 138), ('arranged marriage,', 137), ('eventual smut,', 136), ('first time,', 134), ('friends to lovers,', 134), ('established relationship,', 131), ('mutual pining,', 130), ('character death,', 130), ('pregnancy,', 130), ('gendrya - freeform,', 127), ('family,', 119), ('modern au,', 117), ('sex,', 116), ('incest,', 113), ('humor,', 112), ('fix-it of sorts,', 111), ('dragons,', 108), ('light angst,', 104), ('shameless smut,', 103), ('pining,', 103), ('drabble,', 102), ('sibling incest,', 100), ('plot what plot/porn without plot,', 99), ('falling in love,', 98), ('anal sex,', 95)]

Category (sexual pairing)

In [12]:
#seasons 1 and 2 – season 3 starts March 2013
got1_2cat = TagsAnalyzer(got_all,'2006-08','2013-02','category')
print('\n Seasons 1 and 2')
print(got1_2cat)

#seasons 3 and 4 – season 5 starts April 2015
got3_4cat = TagsAnalyzer(got_all,'2013-03','2015-03','category')
print('\n Seasons 3 and 4')
print(got3_4cat)

#seasons 5 and 6 – season 7 starts July 2017
got5_6cat = TagsAnalyzer(got_all,'2015-07','2017-06','category')
print('\n Seasons 5 and 6')
print(got5_6cat)

#season 7 – seasons 8 starts April 2019
got7cat = TagsAnalyzer(got_all,'2017-07','2019-03','category')
print('\n Season 7')
print(got7cat)

#season 7 – seasons 8 starts April 2019
got8cat = TagsAnalyzer(got_all,'2019-04','2019-09','category')
print('\n Season 8')
print(got8cat)
 Seasons 1 and 2
/Users/caramessina/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:7: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

[('f/m,', 808), ('m/m,', 316), ('gen,', 282), ('f/f,', 68), ('multi,', 65), ('other,', 10)]

 Seasons 3 and 4
[('f/m,', 2920), ('m/m,', 1127), ('gen,', 1020), ('f/f,', 424), ('multi,', 286), ('other,', 87)]

 Seasons 5 and 6
[('f/m,', 4189), ('m/m,', 1096), ('gen,', 893), ('f/f,', 591), ('multi,', 411), ('other,', 124)]

 Season 7
[('f/m,', 7216), ('m/m,', 1234), ('gen,', 1096), ('f/f,', 865), ('multi,', 600), ('other,', 183)]

 Season 8
[('f/m,', 5205), ('gen,', 845), ('m/m,', 764), ('f/f,', 579), ('multi,', 373), ('other,', 158)]

Visualizations and Analytics

For the first portion, I have to take the tuples I made above to then transform them into dataframes so that they may be analyzed. First, I will do the categories for both GoT and TLoK.

In [13]:
def tuple_to_df(tup):
    newdf = pd.DataFrame(list(tup))
#     newdf = newdf.rename(columns={0:column1, 1:column2})
    return newdf
In [14]:
gotcat1 = tuple_to_df(got1_2cat)
gotcat2 = tuple_to_df(got3_4cat)
gotcat3 = tuple_to_df(got5_6cat)
gotcat4 = tuple_to_df(got7cat)
gotcat5 = tuple_to_df(got8cat)
In [17]:
korracat1 = tuple_to_df(korra_preKAcat)
korracat2 = tuple_to_df(korra_subKAcat)
korracat3 = tuple_to_df(korra_postKAcat)
In [32]:
figGOT = make_subplots(
    rows=2, cols=3,
    shared_yaxes=True,
    subplot_titles=("Seasons 1–2", "Seasons 3–4", "Seasons 5–6", "Season 7", "Season 8 and Beyond"))

figGOT.add_trace(go.Bar(
    y=gotcat1[1], 
    x=gotcat1[0], 
    name="Seasons 1–2"), 
    row=1, 
    col=1)

figGOT.add_trace(go.Bar(y=gotcat2[1], x=gotcat2[0], name="Seasons 3–4"), row=1, col=2)
figGOT.add_trace(go.Bar(y=gotcat3[1], x=gotcat3[0], name="Seasons 5–6"), row=1, col=3)
figGOT.add_trace(go.Bar(y=gotcat4[1], x=gotcat4[0], name="Seasons 7"), row=2, col=1)
figGOT.add_trace(go.Bar(y=gotcat5[1], x=gotcat5[0], name="Seasons 8"), row=2, col=2)

figGOT.update_layout(
    title='Game of Thrones Romantic Pairing Trends'
)

figGOT.write_html('images/GoT-Romantic-Pairings.html', auto_open=True)
In [33]:
figTLOK = make_subplots(
    rows=1, cols=3,
    shared_yaxes=True,
    subplot_titles=("Before Korrasami", "Korrasami Subtext", "Post Korrasami"))

figTLOK.add_trace(go.Bar(
    y=korracat1[1], 
    x=korracat1[0], 
    name='Up To July 2014'), 
              row=1, 
              col=1)

figTLOK.add_trace(go.Bar(y=korracat2[1], x=korracat2[0], name='August–November 2014'), row=1, col=2)

figTLOK.add_trace(go.Bar(y=korracat3[1], x=korracat3[0], name='December 2014 and Beyond'), row=1, col=3)

figTLOK.update_layout(
    title='The Legend of Korra Romantic Pairing Trends'
)

figTLOK.write_html('images/TLoK-Romantic-Pairings.html', auto_open=True)