Intro to Python

Dataframes

Dataframe

import pandas as pd

import as

Importing CSVs

Pokemon.csv

Save link as...

Save

Pokemon.csv

read_csv()

import pandas as pd

poke = pd.read_csv('Pokemon.csv')
print(poke)

Column Names

index

Shape

...

1. The head() function will show rows of data starting from the top of the dataframe. You can add a number as an argument to the function to see a specific number of rows, otherwise it will show you the top 5 rows.

2. The tail() function will show rows of data starting from the bottom of the dataframe. You can add a number as an argument to the function to see a specific number of rows, otherwise it will show you the bottom 5 rows.

print(poke)

print(poke.head())

print(poke.head(20))

print(poke.tail())

print(poke.tail(30))

print(poke.head(20))

1. The info() function shows us information about the entire dataframe. This is helpful to see all of the column names and what type the column is. It will also tell us how many null values the column has.

2. The describe() function tries to summarize all the columns of the table, which can be helpful for information like finding the mean (average) of all values of a column. Based on the output of the describe function, we can know that the average Speed of a pokemon is around 68.

info()

print(poke.info())

1. In this dataframe, each pokemon's name is present in the "Name" column.

2. The "Type 1" column shows the primary type of the pokemon, and the "Type 2" column shows the secondary type. Since not all pokemon have secondary types, some of these values are blank, or null. There are 414 pokemon that have a value in the Type 2 column.

3. There are 6 columns related to the pokemon's individual stats, the "HP", "Attack", "Defense", "Sp. Atk", "Sp. Def", and "Speed" columns. The higher these values are, the more powerful the pokemon is.

4. The "Total" column is the sum of the individual stat columns for the pokemon.

5. The "Generation" column shows which version of the pokemon game they were first seen in. Based on the output of the describe function, we know that the values of this column go from 1 to 6.

6. The "Legendary" column is a boolean value, which means it is either True or False. If this value is True, it indicates a pokemon that is especially rare and powerful.

describe()

print(poke.describe())

describe()

Showing Specific Rows

print(poke['Name'])

print(poke[['Name', 'Type 1']])

print(poke[poke['Name'] == 'Pikachu'])

Filter

poke[]

poke['Name'] == 'Pikachu'

#Will return all rows where the 'Name' column contains the word "Mega"
print(poke[poke['Name'].str.contains('Mega')])

#You can add a '~' in front of the condition to reverse it; this will only show pokemon without "Mega" in their name
print(poke[~poke['Name'].str.contains('Mega')])

#Will return all rows where the 'Speed' column is greater than 120
print(poke[poke['Speed'] > 120])

#Will return all legendary pokemon from generation 1, use the '&' symbol between conditions to have more than one criteria for selection
print(poke[(poke['Generation'] == 1) & (poke['Legendary'] == True)])

Data Manipulation

groupby()

count()

print(poke['Type 1'].groupby(poke['Type 1']).count())

sort_values()

print(poke['Type 1'].groupby(poke['Type 1']).count().sort_values(ascending=False))

ascending=False

sort_values()

poke_power = poke[['Total', 'Name', 'Type 1']].sort_values(by='Total',ascending=False)

poke_power = poke[['Total', 'Name', 'Type 1']].sort_values(by='Total',ascending=False)
# Will show the most powerful pokemon of that type
print(poke_power.groupby('Type 1').first())

# Will show the least powerful pokemon of each type
print(poke_power.groupby('Type 1').last())

Data Cleanup

print(poke[poke['Name'].str.contains('Mewtwo')])

drop_duplicates()

no_duplicates_poke = poke.drop_duplicates('#', keep='first')
print(no_duplicates_poke[no_duplicates_poke['Name'].str.contains('Mewtwo')])

Challenge: Pokemon

#Cheat code answers

#Top 6 in power
print(poke.sort_values(by='Total', ascending=False).head(6))

#no legendaries
print(poke[(poke['Legendary'] == False)].sort_values(by='Total', ascending=False).head(6)

#No legendaries or megas
print(poke[(poke['Legendary'] == False) & (~poke['Name'].str.contains('Mega'))].sort_values(by='Total', ascending=False).head(6)

#No legendaries or megas all are different primary types
print(poke[(poke['Legendary'] == False) & (~poke['Name'].str.contains('Mega'))].sort_values(by='Total', ascending=False).drop_duplicates('Type 1', keep='first', inplace=False).head(6))

Dataframes with Pandas