Graphs - Matplotlib

Now that we know the basics of pandas, we're going to use the module matplotlib to create graphs visualizing our data.

Data Science Clients



    As a data scientist, you may often find yourself studying areas you don't know a lot about, providing information for people who think they know a lot about their area. So you've never been a pokemon trainer, but by using data you can help pokemon trainers make better decisions.

    We're going to be helping a trainer named Ash. Ash is training to become a pokemon master. But he's run into a little bumps along the way. If he's going to become the pokemon master, he's going to need to need a little help.

    You talk to Ash about their pokemon training. "I don't understand why I keep losing," he tells you, "My pikachu is the best pokemon!"

    Hmmm...that's a pretty big claim. Maybe let's take a look at pikachu in our data set and see what we can find? Ash may think he understands pokemon, but without looking at the data he may not have the full picture.

Ash's Pikachu



    First, make sure that the Pokemon.csv you downloaded is in the same location, and create a new python file called "pokemonGraphs.py"

    If you don't have the file, you can download it again here: Pokemon.csv

    We'll do the pandas import like we did last time, and we'll also import the necessary plotting functionality from matplotlib.

    import pandas as pd
    from matplotlib import pyplot as plt
    
    poke = pd.read_csv("Pokemon.csv")
          

    Next, we're going to make a subset of our data with only 5 columns.

    Then, we'll take a look at Pikachu again.

    import pandas as pd
    from matplotlib import pyplot as plt
    
    poke = pd.read_csv("Pokemon.csv")
    
    poke_stats = poke[['Name', 'Type 1', 'Total', 'Generation', 'Legendary']]
    
    print(poke_stats[poke_stats['Name'] == 'Pikachu'])
          

    This gives us simplified table that we can use to compare pokemon by their stats, poke_stats(), and we can now see that Pikachu's total power is 320.

    Cannot Load Image

    We'll use the mean() aggregator to determine what the average pokemon's strength is.

    print(poke_stats[poke_stats['Name'] == 'Pikachu'])
    
    print("\nAverage Pokemon's Total Power")
    print(poke_stats.Total.mean())
          

    Cannot Load Image

    You try to break the news gently to Ash that pikachu isn't actually the best pokemon or even above average.

    "But wait!" Ash tells you, "You can't compare pikachu's power to legendary and mega pokemon!"

    Well, he's right on that point. Let's re-do the math only comparing pikachu to other common pokemon, and remove duplicates as well.

    You can decide whether or not you want to break this up into multiple steps, or do it all on one line.

    print(poke_stats[poke_stats['Name'] == 'Pikachu'])
    
    poke_common = poke_stats[(poke_stats['Legendary'] == False) & (~poke['Name'].str.contains('Mega'))].drop_duplicates('Name', keep='first')
    
    print("\nAverage Pokemon's Total Power")
    print(poke_common.Total.mean())
          


    Cannot Load Image

    The result is 405.96.

    Fine, you've now got the data that shows that pikachu is below average for all the pokemon. But let's compare pikachu with just the pokemon from generation 1, then compare pikachu with just the other electric types from generation 1.

    poke_common = poke_stats[(poke_stats['Legendary'] == False) & (~poke['Name'].str.contains('Mega'))].drop_duplicates('Name', keep='first')
    
    poke_gen1 = poke_common[poke_common['Generation'] == 1]
    poke_electric = poke_gen1[poke_gen1['Type 1'] == 'Electric']
    
    print("Average Pokemon's Total Power")
    print(poke_electric.Total.mean())
                    


    Cannot Load Image

    Oh no. By being more specific, it looks like pikachu's stats look even worse by comparison. It's time to make a recommendation to Ash. If he wants a more powerful pokemon than pikachu on his team, what pokemon should he pick? We've already shown how to rank order pokemon by power in the previous lesson, now we're going to illustrate the power of pokemon in a chart so Ash can make a good decision.

Bar Graphs



    The matplotlib module lets us create graphs which can show pikachu's power relative to the other potential electric types Ash could pick. To show this, we're going to use a sorted bar chart.

    To create a bar chart, we're going to use the plt.bar() function. There are two arguments that we need to specify, what we want to measure along the y-axis, and what labels we want to show on the x-axis.

    First, we're going to create a dataframe where the results are sorted by total power.

    Then, we're going to use the plt.bar() function. The x argument will be the names of the different pokemon, and the height of the bars will be determined by the Total power of the pokemon.

    poke_electric_chart = poke_electric.sort_values(by='Total', ascending=False)
    
    plt.bar(x=poke_electric_chart['Name'], height=poke_electric_chart['Total'])
    plt.show()
                    




    After running the module, you will see a chart appear that shows the different electric type pokemon that Ash could choose. And it looks like not only was Ash incorrect in his assumption that pikachu was "the best", but the exact opposite was true! According to the power statistics we have from this dataset, pikachu is the weakest pokemon!

    Although it may be tempting to simply tell Ash that he should replace pikachu, when solving problems through data science it's also important to know our own limits. Ash may have knowledge of factors outside the data that we have that might influence his decision. It's our responsibility to tell the pokemon trainer what we know, but it's ultimately up to them to make the right decision for their team.

The Competition



    Ash has told us that he wants to become the pokemon master, but in order to do that, he's going to need to defeat his rival, Gary. It's time to do some more comparisons of their two teams to see how they stack up.

    Here's the current pokemon from Ash's team:

    1. Pikachu
    2. Butterfree
    3. Pidgeot
    4. Bulbasaur
    5. Charizard
    6. Squirtle


    Here's the pokemon that are on Gary's team:

    1. Pidgeot
    2. Alakazam
    3. Rhydon
    4. Exeggutor
    5. Arcanine
    6. Blastoise


    Now that we know the two team compositions, let's compose two different dataframes of the teams.

    First, we create two lists of each of the teams.

    ash_list = ['Pikachu', 'Butterfree', 'Pidgeot', 'Bulbasaur', 'Charizard', 'Squirtle']
    gary_list = ['Pidgeot', 'Alakazam', 'Rhydon', 'Exeggutor', 'Arcanine', 'Blastoise']
                  

    Next, we can use the isin() function for our filter. It will get all the rows from the dataframe where the values match the values in our lists.

    ash_list = ['Pikachu', 'Butterfree', 'Pidgeot', 'Bulbasaur', 'Charizard', 'Squirtle']
    gary_list = ['Pidgeot', 'Alakazam', 'Rhydon', 'Exeggutor', 'Arcanine', 'Blastoise']
    
    poke_ash = poke[poke['Name'].isin(ash_list)]
    poke_gary = poke[poke['Name'].isin(gary_list)]
                  

    Finally, we create two separate graphs by using the plt.figure() function. This lets us create information for each graph at a time, and then show both of them at the end so we can compare the results. Running the below code will create 2 graphs, not just one.

    ash_list = ['Pikachu', 'Butterfree', 'Pidgeot', 'Bulbasaur', 'Charizard', 'Squirtle']
    gary_list = ['Pidgeot', 'Alakazam', 'Rhydon', 'Exeggutor', 'Arcanine', 'Blastoise']
    
    poke_ash = poke[poke['Name'].isin(ash_list)]
    poke_gary = poke[poke['Name'].isin(gary_list)]
    
    #Creates the bar graph for Ash
    ash_graph = plt.figure(1)
    plt.bar(x=poke_ash['Name'], height=poke_ash['Total'])
    plt.title("Ash's Pokemon Team")
    
    #Creates a separate bar graph for Gary
    gary_graph = plt.figure(2)
    plt.bar(x=poke_gary['Name'], height=poke_gary['Total'])
    plt.title("Gary's Pokemon Team")
    plt.show()
                  




    Oh dear. As hard as Ash might train, the comparison here shows that Ash's team has a lot of work to do to improve. In particular, the weak links of his team are Pikachu, Squirtle, and Bulbasaur.

    You present this data to Ash, and ask him if he really wants to challenge his rival for the pokemon championship.

    "Wait!" he tells you, "What if I changed out the Pikachu for Raichu, changed Squirtle for Blastoise, and changed Bulbasaur for Venusaur?"

    Try to change out the list of the pokemon team to see how strong it is with the changed team members.

    Next, we'll go over some other types of charts that you can use with matplotlib

Other Chart Types



    There are a large number of different types of charts you can make with matplotlib, but we'll cover a few of them here.

    1. Stacked Bar Chart: Use this chart type when you want to break down one bar into multiple parts.

    2. Boxplot: Helpful when you want to show how wide a range of potential values is

    3. Scatterplot: Useful when you have a large number of data points across two axes, and you want to find where there are clusters of data


    Stacked Bar Chart

    Let's start with the Stacked Bar chart. The total stats of a pokemon are composed of multiple parts. Instead of viewing the sum totals of each pokemon's stats, how about we view all the parts individually?

    stat_types = ['HP', 'Attack', 'Defense', 'Speed', 'Sp. Atk', 'Sp. Def']
    hp_stats = poke_ash['HP']
    attack_stats = poke_ash['Attack']
    defense_stats = poke_ash['Defense']
    speed_stats = poke_ash['Speed']
    special_attack_stats = poke_ash['Sp. Atk']
    special_defense_stats = poke_ash['Sp. Def']
    total_stats = poke_ash['Total']
    
    plt.bar(poke_ash['Name'], hp_stats, color='#00ff00')
    plt.bar(poke_ash['Name'], attack_stats, bottom=hp_stats, color='#ff0000')
    plt.bar(poke_ash['Name'], defense_stats, bottom=(hp_stats+attack_stats), color='#0000ff')
    plt.bar(poke_ash['Name'], speed_stats, bottom=(hp_stats+attack_stats+defense_stats), color='#ffff00')
    plt.bar(poke_ash['Name'], special_attack_stats, bottom=(hp_stats+attack_stats+defense_stats+speed_stats), color='#990099')
    plt.bar(poke_ash['Name'], special_defense_stats, bottom=(hp_stats+attack_stats+defense_stats+speed_stats+special_attack_stats), color='#440044')
    plt.title("Ash's Pokemon Team Stats Breakdown")
    plt.legend(stat_types,loc=1)
    plt.show()
                


    Cannot Load Image

    This is a Stacked Bar Chart and is useful when showing how a total is broken down into different parts. By using this chart, we can see that some pokemon have higher stats in certain areas than others.

    Boxplot

    For our boxplot we're going to try to answer the question: which pokemon type is generally the strongest? Instead of checking which type has the single strongest pokemon, by looking at the range of strength for all pokemon in that type we can come up with the type where you are likely to find a strong pokemon.

    poke_common.boxplot(column='Total', by='Type 1')
    plt.show()
                


    Cannot Load Image

    The box plot has 5 different parts. The top line shows the highest value for the category (Max). Where the box ends at the top represents the 75% percentile, so 25% of pokemon are stronger than the end of the box. The middle of the box is the average strength (mean) for that category. The bottom of the box represents the 25% percentile, so 25% of pokemon are weaker than that value. The final bottom line shows the weakest strength for that category.

    Scatterplot

    The Scatterplot allows you to plot individual values along both an x and a y coordinate. We can use this graph to find out interesting information about the relationship between the attack and defense of pokemon.

    poke.plot.scatter(x='Attack', y='Defense', title='Pokemon Attack and Defense')
    plt.show()
                


    Cannot Load Image

    By using this chart, you can see that there are some outliers that have high attack and low defense, and some pokemon that have low attack and high defense. If you need a pokemon for a specific purpose, it might be good to use this type of chart to find clusters or outliers.

    Now that you can make some basic graphs, you can look up other types of graphs and see other ways that you can view your data.

    Python Graph Gallery