The analysis below is being prepared by the dataset provided by 1 Million Women to Tech at https://github.com/1mwtt-wod1/gender-gap-datathon.
The dataset is compiled and/or recorded by World Economic Forum (WEF) to measure Global Gender Gap in countries (http://www3.weforum.org/docs/WEF_GenderGap_Report_2006.pdf). The dataset has four subindexes; Global Gender Gap Economic Participation and Opportunity, Global Gender Gap Educational Attainment, Global Gender Gap Health and Survival and Global Gender Gap Political Empowerment in addition to Overall Global Gender Gap index. In 2016, a new subindex measuring Wage equality between women and men for similar work is also added to the dataset.
In 2016, a new methodology was added in the second Global Gender Gap Report to capture the size of the gap more efficiently. Here is a short excerpt from that report that we would like to include in this report:
"One particular societal and economic challenge is the persistent gap between women and men in their access to resources and opportunities.This gap not only undermines the quality of life of one half of the world’s population but also poses a significant risk to the long-term growth and well-being of nations: countries that do not capitalize on the full potential of one half of their human resources may compromise their competitive potential.(http://www3.weforum.org/docs/WEF_GenderGap_Report_2006.pdf)
There are many positive steps taken by many entities around the world. In 2010, United Nations General Assembly founded UN Women to address issues that women are facing. In 2014, UN Women launched He for She, a global campaing to include men in the gender equality dicsussions, as well. For UN Women, please go to http://www.unwomen.org/en/about-us/about-un-women and for the He for She website please go to https://www.heforshe.org/en.
In this report, we will focus on improvements and positive changes in the ranking of countries between the years 2006 and 2016. After providing some information about the ranking in the world, we will look at countries who improved their rank significantly. We will define improving rank concept in three different ways: The first one compares overall gender gap ranks between 2006 and 2016 only. After creating a dataframe, we will look at the top three countries who improved their ranking the most.
Looking at the difference in ranks between 2006 and 2016 is very useful to determine the countries who improved their ranking. However, a quick examination of the dataset shows that ranks of many countries fluctuated a lot during the years 2006 to 2016. A countries rank might be improved significantly but then rank might get worse during recent years. So, to find the factors influencing the ranking, we will look at the maximum positive difference occurred between successsive years during 2006-2016. We will create a new column by comparing ranks of successive years and then recording the max positive difference between successive years in a new column. In addition, the years when the max difference occured are also coded in another newly created column. After sorting the dataset according to the max difference column, we will look at Top 3 countries who impoved their overall rank the most according to this criteria.
Lastly, we will look at how many times a country improved their rank or preserved their rank between 2006-2016. This might give us information about countries' effort in improving or preserving their ranks and whether these changes were temporary or have long-lasting effects.
From now on, we will be working with this dataset including overall rankings only. The following research problems are explored:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import plotly
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, plot, iplot
from plotly import tools
init_notebook_mode(connected=True) #to start offline plotly
gender=pd.read_csv('globalgendergap2016.csv')
gender1=gender[gender["Subindicator Type"]=="Rank"]
ranking=gender1[gender1["Indicator"]=="Overall Global Gender Gap Index"]
ranking.head()
This report is organized as follows: In Section 1, we will explore how ranking changed across years. In addition to comparing datasets related to different years, we will be looking at the average ranking in the world and look at the world map to see the current situation. In Section 2, we will try to answer some specific questions. For this, we will create new datasets with some new comparison tools. After that, we will focus on countries that are on top of those lists and try to find out more about their ranking and factors influencing their ranking. In Section 3, some conclusions will be stated in addition to the problems and issues that we encountered in this analysis. Future directions and interesting related topics will be explored, as well.
In the plots below, we will compare the ranking of countries in 2016(the last year in the dataset) with the ranking in 2006 to explore the differences between two datasets. We believe that this would provide a general overview of the ranking of countries.
colors=['red','darkred','green','darkgreen']
years=['2006','2007','2014','2015']
f=tools.make_subplots(rows=2,cols=2,
subplot_titles=('2006 vs 2016','2007 vs 2016','2014 vs 2016','2015 vs 2016'))
for i in range(0,4):
if i<2:
scatter=go.Scatter(dict(x=ranking[years[i]],
y=ranking['2016'],
marker=dict(color=colors[i]),
text=ranking['Country Name'],
name=years[i]+' vs 2016',
mode='markers'))
f.append_trace(scatter,1,i+1)
if 2<=i<4:
scatter=go.Scatter(dict(x=ranking[years[i]],
y=ranking['2016'],
marker=dict(color=colors[i]),
text=ranking['Country Name'],
name=years[i]+' vs 2016',
mode='markers'))
f.append_trace(scatter,2,i-1)
f['layout'].update(dict(title='Overall Global Gender Gap Rank'),showlegend=False)
iplot(f)
As it can be seen from these scatterplots, the plots 2006 vs 2016 and 2007 vs 2016 looks very different than 2014 vs 2016 and 2015 vs 2016. The last plot (2015 vs 2016) shows that ranking of countries is becoming more stable, i.e., new ranks in 2016 are close to that of 2015. The reason of the data becoming more stable might be due to the fact that countries are getting used to their rating. It might be the case that when WEF started measuring the gap for the first time in 2005, the ranking created awareness and many countries took concrete steps in decreasing gap parity between women and men. That may explain the difference in plots between the early years (2006 and 2007) and later years (2015 and 2016).
Since a lot of changes and fluctuations are observed in the dataset, we decided to look at the average rank to capture any interesting patterns in the data. The starting point is getting a general idea of the global ranking of countries. For this purpose, we first create a new column, Average Rank, and then look at the world map.
overall=ranking.copy()
years=[]
for i in range(2006,2017):
years.append(str(i))
overall['Average Rank']=overall[years].mean(numeric_only=True,axis=1).astype(int)
overall.sort_values(by='Average Rank', ascending=True).head(5)
The dataset shows that when we sort the data according to the average overall rank, the first seven countries are the first seven countries in 2015 and 2016. Denmark's overall average rank is eight even though their overall rank gradually decreased in 2015 and 2016.
def world_rank_map(rank):
color_scale=[[0.40,'rgb(75, 75, 0)'],[0.30,'rgb(75, 38, 255)'],[0.60,'rgb(0,0,0)'],[0.90,'rgb(255, 238, 255)']]
data=[dict(type='choropleth',
text=overall['Country Name'],
locations=overall['Country ISO3'],
z=overall[rank].astype(float),
marker=dict(line=dict(color='rgb(180,180,180)',width=0.5)),
autocolorscale=False,
reversescale=True,
colorbar={'title':'Rank'})]
#Now, let's plot the values:
fig={'data': data,'layout':{'title':'Overall Global Gender Gap Index - '+ rank,
'geo':{'scope':'world', 'projection':{'type': 'equirectangular'}},
'showlegend':True}}
return iplot(fig)
world_rank_map('Average Rank')
Average Ranking and Continents:
Next, we look at the Top 20 countries in year 2016.
Next, let's explore Top 20 ranking in the year 2016.
overall_ranked=overall.sort_values(by="2016", ascending=True)
overall_ranked_top20=overall_ranked[:20]
overall_ranked_top20=overall_ranked_top20.reset_index()
overall_ranked_top20.head(5)
bar2015=go.Bar(x=overall_ranked_top20['Country Name'],
y=overall_ranked_top20['Average Rank'],
marker=dict(color='darkblue'),
opacity=0.5,
name='Average Rank')
bar2016=go.Bar(x=overall_ranked_top20['Country Name'],
y=overall_ranked_top20['2016'],
marker=dict(color='darkred'),
name='2016')
bars=[bar2015,bar2016]
fig={'data': bars,'layout':{'title':'Top 20 Ranked Countries in 2016',
'xaxis':{'title':'Country'},
'yaxis':{'title':'Rank'}}}
iplot(fig)
What are the similarities and differences in the Top 20 countries in 2016?
This dataframe and following barplot shows the ranking sorted in an ascending order according to their 2016 ranking. In addition, the barplot includes average rank of countries with ranks in Top 20. Note that the newly created column Average Rank provides average rank of each country over the years 2006 and 2016. A high average rank means that that country's rank was very high at some time in the past and improved significantly later so that the country was in Top 20 list in 2016.
In 2016, Iceland has rank 1 which is followed by other Nordic countries Finland, Norway and Sweden. Since 2006, these four countries always had ranks between 1 and 4. In 2006 and 2007, Sweden's rank was number one but later increased to four while Iceland's ranking was initially four and decreased to one and stays the same from 2009 to 2016. Rwanda's records were available from 2014 onwards and gradually decreased to rank five.
When we look at the countries in Top 20 in 2016, we see that there are thirteen countries in Europe, four in Africa, one in Asia, one in Ocenia and one in North America.
It is worth noting that while rank of Nicaragua, the only North American country in Top 20, was 62 in 2006, its rank decreased to 10 in 2016 (it was even 6 in 2014 and its average rank is 34).
Another notable example is Slovenia (rank=8 in 2016 and average rank=36) who improved their rank significantly. According to UN Women, Slovenia committed to improve the gender gap in their country and took several positive steps including a call for support from men and boys to support policies and activities regarding gender gap issues (http://www.unwomen.org/en/get-involved/step-it-up/commitments/slovenia)
In addition to Nicaragua and Slovenia, Namibia(rank=14 in 2016 and average rank=31) and France (rank=17 in 2016 and average rank=36) also took many positive steps to increase their ranking. According to a report published by the European Parliement http://www.europarl.europa.eu/RegData/etudes/IDAN/2015/510024/IPOL_IDA(2015)510024_EN.pdf ......
There are several ways to find out which countries improved their ranking the most. Below, each one is explained in detail.
After creating a new column according to each critera, we will look at Top 3 countries in detail by using their subindexes measuring economic participation, educational attainments, health and survival and political empowerment. To get more information about the factors influencing these positive changes in these countries, we will provide two different types of plots.
The following two functions are defined to plot data related to subindexes. The first function helps us to see the change in each subindex by using a line graph. Note that the plots still uses ranks rather than indexes because it is more clear to show how the overall ranking depend on subindexes. However, this does not capture all information about the subindexes. Some steep increases or decreases might be due to other factors, such as several countries sharing the same rank. Therefore, a boxplot will be provided to get a better picture of the ranking and to support the information presented in the line graph.
def country_scatterplot(country):
index_list=['Economic Participation', 'Education', 'Health', 'Political','Overall', 'Wage Equality']
gender_country=gender[(gender['Country Name']==country) & (gender['Subindicator Type']=='Rank')]
gender_country.index=index_list
scatters_list=[]
colors=['red','blue','green','brown','purple','orange']
for i in range(0,len(index_list)):
scat=go.Scatter(x=years,
y=gender_country.loc[index_list[i],years],
marker=dict(color=colors[i]),
name=index_list[i],
mode='lines')
scatters_list.append(scat)
fig={'data': scatters_list,'layout':{'title':country,
'xaxis':{'title':'Years'},
'yaxis':{'title':'Rank'}}}
return iplot(fig)
def country_boxplot(country):
index_list=['Economic Participation', 'Education', 'Health', 'Political','Overall','Wage Equality']
gender_country=gender[(gender['Country Name']==country) & (gender['Subindicator Type']=='Rank')]
gender_country.index=index_list
box_list=[]
colors=['red','blue','green','black','purple','orange']
for i in range(0,len(index_list)):
boxp=go.Box(y=gender_country.loc[index_list[i],years],
boxpoints='all',
marker=dict(color=colors[i]),
name=index_list[i])
box_list.append(boxp)
fig={'data': box_list,'layout':{'title':country,
#'xaxis':{'title':'Subindexes'},
'yaxis':{'title':'Rank'}}}
return iplot(fig)
One way is finding a rough estimate by using data related to the years 2006 and 2006. For this, two columns are created by using two newly defined functions: One columns gives information whether a country improved their ranking in 2016. If 2006 rank is greater than 2016 rank, the change is positive, otherwise it is negative. Unfortunately, there is no report for some countries in 2006, or 2016 or both. These are recorded as missing 2006, missing 2016 and missing both.
def changed(row):
if pd.isna(row["2006"])==True & pd.notna(row["2016"])==True:
return "missing 2006"
elif pd.notna(row["2006"])==True & pd.isna(row["2016"])==True:
return "missing 2016"
elif row["2006"]>row["2016"]:
return "positive"
elif row["2006"]<row["2016"]:
return "negative"
def change_val(row):
if pd.isna(row["2006"])==True & pd.notna(row["2016"])==True:
return np.nan
elif pd.notna(row["2006"])==True & pd.isna(row["2016"])==True:
return np.nan
else:
return row["2006"]-row["2016"]
def change_val2(row):
if pd.isna(row["2015"])==True & pd.notna(row["2016"])==True:
return np.nan
elif pd.notna(row["2015"])==True & pd.isna(row["2016"])==True:
return np.nan
else:
return row["2015"]-row["2016"]
overall["Change 2006-2016"]=overall.apply(lambda row: changed(row),axis=1)
overall["Numerical Change 2006-2016"]=overall.apply(lambda row: change_val(row),axis=1)
overall=overall.sort_values(by="Numerical Change 2006-2016",ascending=False)
overall_count=overall.groupby('Change 2006-2016').count()
overall_mean=overall.groupby('Change 2006-2016').mean()
overall.head()
world_rank_map('Numerical Change 2006-2016')
The world map provides a more clear picture about the overall change in the world. For example, when we look at the difference between years 2006 and 2016, we see that Bolivia is the country that improved their ranking the most followed by France and Nicaragua. The world map also shows that many countries increased their ranks (brownish colors). [We defined change positive if their 2016 rank is less than their 2006 rank.]
overall_stat=overall_mean.copy()
overall_stat=overall_stat[['2016','Average Rank','Numerical Change 2006-2016']]
overall_stat=overall_stat.rename(columns={'2016':'mean 2016','Average Rank':'mean of Average Rank', 'Numerical Change 2006-2016':'mean change'})
overall_stat['count']=overall_count['Country Name']
overall_stat
Statistics table above confirms our observation: There are 83 countries whose ranks got worse while there are 30 countries whose ranks got better. The mean change in rank for the countries who changed their rank positively is 17.4 while the mean change in rank of the countries whose rank changed negatively is -22.42.
Next, we look at the Top 3 countries who decreased their ranking the most in detail and try to find the factors contributing to the positive change in their ranking.
country_scatterplot('Bolivia')
country_boxplot('Bolivia')
For Bolivia, we see that the overall ranking (purple color) change more like the changes in their political empowerment subindex(brown). This is more clear between the years 2011 and 2012. Note that only political empowerment ranking improved while all other subindexes were either increased or stayed the same. Accordingly, overall rank got better. As for other years, this trends continues. The other subindex affecting the overall rank is economic participation.
country_scatterplot('France')
country_boxplot('France')
The second country who improved their rank the most is France. Line plot clearly shows that their health and education parity is excellent (rank=1) over all years. The line plot clearly shows that overall ranking closely follows political empowerment subindex. For more information about France's progress, please look at the report at http://www.europarl.europa.eu/RegData/etudes/IDAN/2015/510024/IPOL_IDA(2015)510024_EN.pdf.
country_scatterplot('Nicaragua')
country_boxplot('Nicaragua')
In Nicaragua, the line plot is very complicated than Bolivia and France. It seems like the improvement in their overall ranking is a result of improvements in health and political empowerment. Educational attainment ranking fluctuates a lot but it may also have some positive effect on the overall ranking. Nicaragua's overall ranking decreased and stayed around 10 even though economic participation ranking increased between the years 2011 and 2015. It seems like the increase in their rank from 2014 to 2015 was due to the increase in the economic participation subindex.
So far, we focused on changes between the years 2006 and 2016 and looked at the factors leading up to that change in top 3 counties in that list. However, this gives us a rough information about the ranking. As we can see from the dataset, there might be some fluctations over the years, i.e., a country's ranking might change significantly from year to year. A possible reason contributing to this change could be the change in ranking of other countries. It might also be due to a significant event or a policy change in that country. To explore sudden changes in ranks over the years, we will create a new column, max difference, which gives us the max positive change observed during the years 2006 and 2016. In addition, we create a new column showing the years to find out whether the change occurred is more recent or not.
df_max_diff=ranking.copy()
lst=[]
lst2=[]
for index,row in df_max_diff.iterrows():
max_difference=0
year='2006'
for i in range(4,14):
if pd.notna(row[i])==True and pd.notna(row[i+1])==True:
if row[i]>row[i+1]:
x=row[i]-row[i+1]
if max_difference<x:
max_difference=x
year=(df_max_diff.columns[i]+'-'+df_max_diff.columns[i+1])
else:
continue
else:
max_difference=max_difference
lst.append(max_difference)
lst2.append(year)
df_max_diff["max difference"]=lst
df_max_diff["max difference-years"]=lst2
df_max_diff.sort_values(by='max difference', ascending=False).head()
country_scatterplot('Kenya')
country_boxplot('Kenya')
When the overall ranks are sorted according to the newly created maximum difference column, we see that Kenya and Estonia decreased their ranks by 41 points (Kenya in 2013-2014 and Estonia in 2014-2015). Over the years since 2006, we see that Kenya's overall rank got worse until 2011 (when its rank=99). After 2011, their overall rank got better mainly due to the decrease in economic participation rank. After 2011, political empowerment and economic participation rank graphs follow a similar trend. Accordingly, graph for their overall rank also follows a similar pattern. After 2014, their rank started to increase again and it was 63 in 2016.
country_scatterplot('Estonia')
country_boxplot('Estonia')
Estonia's overall rank decreased from 62 to 21 in 2015, which was the maximum drop in any given successive years. This improvement might be due to their commitment to achieving gender equality by promoting women's rights, taking measures to reduce and prevent violence agaist women and increasing their efforts in closing the gender pay gap (http://www.unwomen.org/en/get-involved/step-it-up/commitments/estonia). The fact that their rank was still around 21 in 2016 might be an indication that they are working on addressing these issues but more data is needed to support this guess.
country_scatterplot('Ghana')
country_boxplot('Ghana')
In Ghana, the line plots shows that the overall rank is affected by the economic participation and health and survival subindexes more than the other subindexes. In 2014, all of its subindexes got worse, economic participation subindex increased to 64 from 24 and overall rank increased to 101 from 76. However, after 2014, their index gradually improved to 38 in 2016. Their rank in 2016 was better than all other ranks in years 2006-2015.
df_pos_count=ranking.copy()
dict_ch2={}
for index,row in df_pos_count.iterrows():
count=0
for i in range(2006,2016):
if pd.notna(row[str(i)])==True and pd.notna(row[str(i+1)])==True:
if row[str(i)]>=row[str(i+1)]:
count+=1
else:
count=count
dict_ch2[index]=count
df_pos_count["count_positive"]=dict_ch2.values()
df_pos_count_sorted=df_pos_count.sort_values(by='count_positive',ascending=False)
df_pos_count_sorted.head()
country_scatterplot('Iceland')
country_boxplot('Iceland')
For the last 8 years, Iceland has the highest global overall rank. From the line graph above, we see that once they improved their educational attainment rank, they were able to preserve their rank as number one.
According to President Johannesson's statement, gender pay gap in Iceland is around 5.7-18.3% (https://www.heforshe.org/en/impact). This shows that even in the number one country, the gender gap is around 10% on average. For more details, please look at https://www.weforum.org/agenda/2017/11/why-iceland-ranks-first-gender-equality/.
country_scatterplot('Philippines')
country_boxplot('Philippines')
Philippines was always in Top 10 since 2006 in the overall ranking. Between 2006 and 2016, their overall rank got worse only two times in 2009 and in 2014. According to the line graph above, we see that Philippines overall rank graph is very similar to their political empowerment rank graph. Another factor contributing to their overall rank would be economic participation.
Note that their overall rank increased by 2 points in 2015 even though their educational attainment rank dropped by 34 points in 2015. It seems like the improvement in their overall rank is due to the improvement in their economic participation rank (from 24 to 16) since political empowerment and health and survival ranks stayed the same in that year.
For more details about their efforts, we refer to the State of Filipino Women Report at (https://pcw.gov.ph/sites/default/files/documents/resources/ESTADO%20NI%20JUANA_THE%20STATE%20OF%20FILIPINO%20WOMEN%20REPORT.pdf)
country_scatterplot('Finland')
country_boxplot('Finland')
Finland is one of the counties that are always in Top 3 in overall ranking. It seems like their overall rank is more dependent on their political empowerment rank since overall rank didn't get affected by their educational attainment rank, which varied a lot between years 2006 and 2016, and their economic participation rank,which was always between 8 and 22. For more information, we refer the reader to http://www.stat.fi/tup/tasaarvo/index_en.html.
country_scatterplot('United States')
country_boxplot('United States')
Overall Global Gender Gap rank of United States was 45 in 2016, which was 22 points less than the Overall Rank in 2006. United States' overall rank got worse in 2015 and 2016 even though there was improvement several times between 2006-2016. Overall rank line graph above looks more similar to the line graph for the economic participation but its values were not as low as economic participation ranking values. It seems like high political empowerment ranking also affected overall gender gap ranking. However, more data is needed to support this claim as the political empowerment ranking fluctuates a lot.
overall[overall["Country Name"]=='United States']
In this report, we focused on positive changes and improvements and explored some factors influencing these changes. A similar study can be done by exploring the countries who decreased their ranking and explore and the events leading up to the decrease in their ranks.
Since WEF started publishing their reports for the first time in 2005, there are many positive steps taken in many countries to decrease the gap in their countries as well as on a global level. However, even in Iceland, #1 in the last 8 years, the gender gap is around 5.7-18.3% (https://www.heforshe.org/en/impact).