Pd dataframe

Pandas DataFrames

❮ PreviousNext ❯


What is a DataFrame?

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

Example

Create a simple Pandas DataFrame:

import pandas as pd

data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)

print(df) 

Result

calories duration 0 420 50 1 380 40 2 390 45
Try it Yourself »

Locate Row

As you can see from the result above, the DataFrame is like a table with rows and columns.

Pandas use the attribute to return one or more specified row(s)

Example

Return row 0:

#refer to the row index:
print(df.loc[0])

Result

calories 420 duration 50 Name: 0, dtype: int64
Try it Yourself »

Note: This example returns a Pandas Series.

Example

Return row 0 and 1:

#use a list of indexes:
print(df.loc[[0, 1]])

Result

calories duration 0 420 50 1 380 40
Try it Yourself »

Note: When using , the result is a Pandas DataFrame.



Named Indexes

With the argument, you can name your own indexes.

Example

Add a list of names to give each row a name:

import pandas as pd

data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

print(df) 

Result

calories duration day1 420 50 day2 380 40 day3 390 45
Try it Yourself »

Locate Named Indexes

Use the named index in the attribute to return the specified row(s).

Example

Return "day2":

#refer to the named index:
print(df.loc["day2"])

Result

calories 380 duration 40 Name: 0, dtype: int64
Try it Yourself »

Load Files Into a DataFrame

If your data sets are stored in a file, Pandas can load them into a DataFrame.

Example

Load a comma separated file (CSV file) into a DataFrame:

import pandas as pd

df = pd.read_csv('data.csv')

print(df) 

Try it Yourself »

You will learn more about importing files in the next chapters.


❮ PreviousNext ❯


Sours: https://www.w3schools.com/python/pandas/pandas_dataframes.asp

}.dataframe-table td{border:1px solid #5fb962;color:#000;text-align:left!important}

Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.

We will get a brief insight on all these basic operation which can be performed on Pandas DataFrame :

Creating a Pandas DataFrame

In the real world, a Pandas DataFrame will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, and Excel file. Pandas DataFrame can be created from the lists, dictionary, and from a list of dictionary etc. Dataframe can be created in different ways here are some ways by which we create a dataframe:

Creating a dataframe using List: DataFrame can be created using a single list or a list of lists.

# import pandas as pd import pandas as pd # list of strings lst = ['Geeks', 'For', 'Geeks', 'is', 'portal', 'for', 'Geeks'] # Calling DataFrame constructor on list df = pd.DataFrame(lst) print(df)

Output:

 
Creating DataFrame from dict of ndarray/lists: To create DataFrame from dict of narray/list, all the narray must be of same length. If index is passed then the length index should be equal to the length of arrays. If no index is passed, then by default, index will be range(n) where n is the array length.

# Python code demonstrate creating # DataFrame from dict narray / lists # By default addresses. import pandas as pd # intialise data of lists. data = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Age':[20, 21, 19, 18]} # Create DataFrame df = pd.DataFrame(data) # Print the output. print(df)

Output:

 
For more details refer to Creating a Pandas DataFrame



Dealing with Rows and Columns

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. We can perform basic operations on rows/columns like selecting, deleting, adding, and renaming.

Column Selection: In Order to select a column in Pandas DataFrame, we can either access the columns by calling them by their columns name.

# Import pandas package import pandas as pd # Define a dictionary containing employee data data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32], 'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'], 'Qualification':['Msc', 'MA', 'MCA', 'Phd']} # Convert the dictionary into DataFrame df = pd.DataFrame(data) # select two columns print(df[['Name', 'Qualification']])

Output:

 
Row Selection: Pandas provide a unique method to retrieve rows from a Data frame. method is used to retrieve rows from Pandas DataFrame. Rows can also be selected by passing integer location to an iloc[] function.

Note: We’ll be using file in below examples.

# importing pandas package import pandas as pd # making data frame from csv file data = pd.read_csv("nba.csv", index_col ="Name") # retrieving row by loc method first = data.loc["Avery Bradley"] second = data.loc["R.J. Hunter"] print(first, "\n\n\n", second)

Output:
As shown in the output image, two series were returned since there was only one parameter both of the times.

For more Details refer to Dealing with Rows and Columns
 

Indexing and Selecting Data

Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns. Indexing can also be known as Subset Selection.

Indexing a Dataframe using indexing operator :
Indexing operator is used to refer to the square brackets following an object. The and indexers also use the indexing operator to make selections. In this indexing operator to refer to df[].

Selecting a single columns

In order to select a single column, we simply put the name of the column in-between the brackets

# importing pandas package import pandas as pd # making data frame from csv file data = pd.read_csv("nba.csv", index_col ="Name") # retrieving columns by indexing operator first = data["Age"] print(first)

Output:

 
Indexing a DataFrame using :
This function selects data by the label of the rows and columns. The indexer selects data in a different way than just the indexing operator. It can select subsets of rows or columns. It can also simultaneously select subsets of rows and columns.

Selecting a single row

In order to select a single row using , we put a single row label in a function.

# importing pandas package import pandas as pd # making data frame from csv file data = pd.read_csv("nba.csv", index_col ="Name") # retrieving row by loc method first = data.loc["Avery Bradley"] second = data.loc["R.J. Hunter"] print(first, "\n\n\n", second)

Output:
As shown in the output image, two series were returned since there was only one parameter both of the times.

 
Indexing a DataFrame using :
This function allows us to retrieve rows and columns by position. In order to do that, we’ll need to specify the positions of the rows that we want, and the positions of the columns that we want as well. The indexer is very similar to but only uses integer locations to make its selections.



Selecting a single row

In order to select a single row using , we can pass a single integer to function.

import pandas as pd # making data frame from csv file data = pd.read_csv("nba.csv", index_col ="Name") # retrieving rows by iloc method row2 = data.iloc[3] print(row2)

Output:

 
For more Details refer

 

Working with Missing Data

Missing Data can occur when no information is provided for one or more items or for a whole unit. Missing Data is a very big problem in real life scenario. Missing Data can also refer to as NA(Not Available) values in pandas.

Checking for missing values using and
In order to check missing values in Pandas DataFrame, we use a function and . Both function help in checking whether a value is or not. These function can also be used in Pandas Series in order to find null values in a series.

# importing pandas as pd import pandas as pd # importing numpy as np import numpy as np # dictionary of lists dict = {'First Score':[100, 90, np.nan, 95], 'Second Score': [30, 45, 56, np.nan], 'Third Score':[np.nan, 40, 80, 98]} # creating a dataframe from list df = pd.DataFrame(dict) # using isnull() function df.isnull()

Output:

 
Filling missing values using , and
In order to fill null values in a datasets, we use , and function these function replace NaN values with some value of their own. All these function help in filling a null values in datasets of a DataFrame. Interpolate() function is basically used to fill values in the dataframe but it uses various interpolation technique to fill the missing values rather than hard-coding the value.

# importing pandas as pd import pandas as pd # importing numpy as np import numpy as np # dictionary of lists dict = {'First Score':[100, 90, np.nan, 95], 'Second Score': [30, 45, 56, np.nan], 'Third Score':[np.nan, 40, 80, 98]} # creating a dataframe from dictionary df = pd.DataFrame(dict) # filling missing value using fillna() df.fillna(0)

Output:

 
Dropping missing values using :
In order to drop a null values from a dataframe, we used function this fuction drop Rows/Columns of datasets with Null values in different ways.

# importing pandas as pd import pandas as pd # importing numpy as np import numpy as np # dictionary of lists dict = {'First Score':[100, 90, np.nan, 95], 'Second Score': [30, np.nan, 45, 56], 'Third Score':[52, 40, 80, 98], 'Fourth Score':[np.nan, np.nan, np.nan, 65]} # creating a dataframe from dictionary df = pd.DataFrame(dict) df


Now we drop rows with at least one Nan value (Null value)

# importing pandas as pd import pandas as pd # importing numpy as np import numpy as np # dictionary of lists dict = {'First Score':[100, 90, np.nan, 95], 'Second Score': [30, np.nan, 45, 56], 'Third Score':[52, 40, 80, 98], 'Fourth Score':[np.nan, np.nan, np.nan, 65]} # creating a dataframe from dictionary df = pd.DataFrame(dict) # using dropna() function df.dropna()

Output:

For more Details refer to Working with Missing Data in Pandas
 

Iterating over rows and columns

Iteration is a general term for taking each item of something, one after another. Pandas DataFrame consists of rows and columns so, in order to iterate over dataframe, we have to iterate a dataframe like a dictionary.



Iterating over rows :
In order to iterate over rows, we can use three function , , . These three function will help in iteration over rows.

# importing pandas as pd import pandas as pd # dictionary of lists dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"], 'degree': ["MBA", "BCA", "M.Tech", "MBA"], 'score':[90, 40, 80, 98]} # creating a dataframe from a dictionary df = pd.DataFrame(dict) print(df)


Now we apply function in order to get a each element of rows.

# importing pandas as pd import pandas as pd # dictionary of lists dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"], 'degree': ["MBA", "BCA", "M.Tech", "MBA"], 'score':[90, 40, 80, 98]} # creating a dataframe from a dictionary df = pd.DataFrame(dict) # iterating over rows using iterrows() function for i, j in df.iterrows(): print(i, j) print()

Output:

 
Iterating over Columns :
In order to iterate over columns, we need to create a list of dataframe columns and then iterating through that list to pull out the dataframe columns.

# importing pandas as pd import pandas as pd # dictionary of lists dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"], 'degree': ["MBA", "BCA", "M.Tech", "MBA"], 'score':[90, 40, 80, 98]} # creating a dataframe from a dictionary df = pd.DataFrame(dict) print(df)


Now we iterate through columns in order to iterate through columns we first create a list of dataframe columns and then iterate through list.

# creating a list of dataframe columns columns = list(df) for i in columns: # printing the third element of the column print (df[i][2])

Output:

 
For more Details refer to Iterating over rows and columns in Pandas DataFrame

DataFrame Methods:

FunctionDescription
index()Method returns index (row labels) of the DataFrame
insert()Method inserts a column into a DataFrame
add()Method returns addition of dataframe and other, element-wise (binary operator add)
sub()Method returns subtraction of dataframe and other, element-wise (binary operator sub)
mul()Method returns multiplication of dataframe and other, element-wise (binary operator mul)
div()Method returns floating division of dataframe and other, element-wise (binary operator truediv)
unique()Method extracts the unique values in the dataframe
nunique()Method returns count of the unique values in the dataframe
value_counts()Method counts the number of times each unique value occurs within the Series
columns()Method returns the column labels of the DataFrame
axes()Method returns a list representing the axes of the DataFrame
isnull()Method creates a Boolean Series for extracting rows with null values
notnull()Method creates a Boolean Series for extracting rows with non-null values
between()Method extracts rows where a column value falls in between a predefined range
isin()Method extracts rows from a DataFrame where a column value exists in a predefined collection
dtypes()Method returns a Series with the data type of each column. The result’s index is the original DataFrame’s columns
astype()Method converts the data types in a Series
values()Method returns a Numpy representation of the DataFrame i.e. only the values in the DataFrame will be returned, the axes labels will be removed
sort_values()- Set1, Set2Method sorts a data frame in Ascending or Descending order of passed Column
sort_index()Method sorts the values in a DataFrame based on their index positions or labels instead of their values but sometimes a data frame is made out of two or more data frames and hence later index can be changed using this method
loc[]Method retrieves rows based on index label
iloc[]Method retrieves rows based on index position
ix[]Method retrieves DataFrame rows based on either index label or index position. This method combines the best features of the .loc[] and .iloc[] methods
rename()Method is called on a DataFrame to change the names of the index labels or column names
columns()Method is an alternative attribute to change the coloumn name
drop()Method is used to delete rows or columns from a DataFrame
pop()Method is used to delete rows or columns from a DataFrame
sample()Method pulls out a random sample of rows or columns from a DataFrame
nsmallest()Method pulls out the rows with the smallest values in a column
nlargest()Method pulls out the rows with the largest values in a column
shape()Method returns a tuple representing the dimensionality of the DataFrame
ndim()Method returns an ‘int’ representing the number of axes / array dimensions.
Returns 1 if Series, otherwise returns 2 if DataFrame
dropna()Method allows the user to analyze and drop Rows/Columns with Null values in different ways
fillna()Method manages and let the user replace NaN values with some value of their own
rank()Values in a Series can be ranked in order with this method
query()Method is an alternate string-based syntax for extracting a subset from a DataFrame
copy()Method creates an independent copy of a pandas object
duplicated()Method creates a Boolean Series and uses it to extract rows that have duplicate values
drop_duplicates()Method is an alternative option to identifying duplicate rows and removing them through filtering
set_index()Method sets the DataFrame index (row labels) using one or more existing columns
reset_index()Method resets index of a Data Frame. This method sets a list of integer ranging from 0 to length of data as index
where()Method is used to check a Data Frame for one or more condition and return the result accordingly. By default, the rows not satisfying the condition are filled with NaN value

 
More on Pandas

  1. Python | Pandas Series
  2. Python | Pandas Working With Text Data
  3. Python | Pandas Working with Dates and Times
  4. Python | Pandas Merging, Joining, and Concatenating



Sours: https://www.geeksforgeeks.org/python-pandas-dataframe/
  1. Office depot dymo
  2. Waves license manager
  3. Bare mineral serum
  4. Chevy cruze hatchback rs specs
  5. Culture clipart

In this short guide, you’ll see two different methods to create Pandas DataFrame:

  • By typing the values in Python itself to create the DataFrame
  • By importing the values from a file (such as a CSV file), and then creating the DataFrame in Python based on the values imported

Method 1: typing values in Python to create Pandas DataFrame

To create Pandas DataFrame in Python, you can follow this generic template:

import pandas as pd data = {'first_column': ['first_value', 'second_value', ...],         'second_column': ['first_value', 'second_value', ...],         ....       } df = pd.DataFrame(data) print (df)

Note that you don’t need to use quotes around numeric values (unless you wish to capture those values as strings).

Now let’s see how to apply the above template using a simple example.

To start, let’s say that you have the following data about products, and that you want to capture that data in Python using Pandas DataFrame:

product_nameprice
laptop1200
printer150
tablet300
desk450
chair200

You may then use the code below in order to create the DataFrame for our example:

import pandas as pd data = {'product_name': ['laptop', 'printer', 'tablet', 'desk', 'chair'], 'price': [1200, 150, 300, 450, 200] } df = pd.DataFrame(data) print (df)

Run the code in Python, and you’ll get the following DataFrame:

You may have noticed that each row is represented by a number (also known as the index) starting from 0. Alternatively, you may assign another value/name to represent each row.

For example, in the code below, the index=[‘product_1′,’product_2′,’product_3′,’product_4′,’product_5’] was added:

import pandas as pd data = {'product_name': ['laptop', 'printer', 'tablet', 'desk', 'chair'], 'price': [1200, 150, 300, 450, 200] } df = pd.DataFrame(data, index=['product_1','product_2','product_3','product_4','product_5']) print (df)

You’ll now see the newly assigned index (as highlighted in yellow):

Let’s now review the second method of importing the values into Python to create the DataFrame.

Method 2: importing values from a CSV file to create Pandas DataFrame

You may use the following template to import a CSV file into Python in order to create your DataFrame:

import pandas as pd data = pd.read_csv(r'Path where the CSV file is stored\File name.csv') df = pd.DataFrame(data) print (df)

Let’s say that you have the following data stored in a CSV file (where the CSV file name is ‘products’):

product_nameprice
laptop1200
printer150
tablet300
desk450
chair200

In the Python code below, you’ll need to change the path name to reflect the location where the CSV file is stored on your computer.

For example, let’s suppose that the CSV file is stored under the following path:

‘C:\Users\Ron\Desktop\products.csv’

Here is the full Python code for our example:

import pandas as pd data = pd.read_csv(r'C:\Users\Ron\Desktop\products.csv') df = pd.DataFrame(data) print (df)

As before, you’ll get the same Pandas DataFrame in Python:

You can also create the same DataFrame by importing an Excel file into Python using Pandas.

Find the maximum value in the DataFrame

Once you have your values in the DataFrame, you can perform a large variety of operations. For example, you may calculate stats using Pandas.

For instance, let’s say that you want to find the maximum price among all the products within the DataFrame.

Obviously, you can derive this value just by looking at the dataset, but the method presented below would work for much larger datasets.

To get the maximum price for our example, you’ll need to add the following portion to the Python code (and then print the results):

max_price = df['price'].max()

Here is the complete Python code:

import pandas as pd data = {'product_name': ['laptop', 'printer', 'tablet', 'desk', 'chair'], 'price': [1200, 150, 300, 450, 200] } df = pd.DataFrame(data) max_price = df['price'].max() print (max_price)

Once you run the code, you’ll get the value of 1200, which is indeed the maximum price:

You may check the Pandas Documentation to learn more about creating a DataFrame.

Categories PythonSours: https://datatofish.com/create-pandas-dataframe/

Python Pandas - DataFrame



A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.

Features of DataFrame

  • Potentially columns are of different types
  • Size – Mutable
  • Labeled axes (rows and columns)
  • Can Perform Arithmetic operations on rows and columns

Structure

Let us assume that we are creating a data frame with student’s data.

Structure Table

You can think of it as an SQL table or a spreadsheet data representation.

pandas.DataFrame

A pandas DataFrame can be created using the following constructor −

pandas.DataFrame( data, index, columns, dtype, copy)

The parameters of the constructor are as follows −

Sr.NoParameter & Description
1

data

data takes various forms like ndarray, series, map, lists, dict, constants and also another DataFrame.

2

index

For the row labels, the Index to be used for the resulting frame is Optional Default np.arange(n) if no index is passed.

3

columns

For column labels, the optional default syntax is - np.arange(n). This is only true if no index is passed.

4

dtype

Data type of each column.

5

copy

This command (or whatever it is) is used for copying of data, if the default is False.

Create DataFrame

A pandas DataFrame can be created using various inputs like −

  • Lists
  • dict
  • Series
  • Numpy ndarrays
  • Another DataFrame

In the subsequent sections of this chapter, we will see how to create a DataFrame using these inputs.

Create an Empty DataFrame

A basic DataFrame, which can be created is an Empty Dataframe.

Example

Live Demo

#import the pandas library and aliasing as pd import pandas as pd df = pd.DataFrame() print df

Its output is as follows −

Empty DataFrame Columns: [] Index: []

Create a DataFrame from Lists

The DataFrame can be created using a single list or a list of lists.

Example 1

Live Demo

import pandas as pd data = [1,2,3,4,5] df = pd.DataFrame(data) print df

Its output is as follows −

0 0 1 1 2 2 3 3 4 4 5

Example 2

Live Demo

import pandas as pd data = [['Alex',10],['Bob',12],['Clarke',13]] df = pd.DataFrame(data,columns=['Name','Age']) print df

Its output is as follows −

Name Age 0 Alex 10 1 Bob 12 2 Clarke 13

Example 3

Live Demo

import pandas as pd data = [['Alex',10],['Bob',12],['Clarke',13]] df = pd.DataFrame(data,columns=['Name','Age'],dtype=float) print df

Its output is as follows −

Name Age 0 Alex 10.0 1 Bob 12.0 2 Clarke 13.0

Note − Observe, the dtype parameter changes the type of Age column to floating point.

Create a DataFrame from Dict of ndarrays / Lists

All the ndarrays must be of same length. If index is passed, then the length of the index should equal to the length of the arrays.

If no index is passed, then by default, index will be range(n), where n is the array length.

Example 1

Live Demo

import pandas as pd data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]} df = pd.DataFrame(data) print df

Its output is as follows −

Age Name 0 28 Tom 1 34 Jack 2 29 Steve 3 42 Ricky

Note − Observe the values 0,1,2,3. They are the default index assigned to each using the function range(n).

Example 2

Let us now create an indexed DataFrame using arrays.

Live Demo

import pandas as pd data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]} df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4']) print df

Its output is as follows −

Age Name rank1 28 Tom rank2 34 Jack rank3 29 Steve rank4 42 Ricky

Note − Observe, the index parameter assigns an index to each row.

Create a DataFrame from List of Dicts

List of Dictionaries can be passed as input data to create a DataFrame. The dictionary keys are by default taken as column names.

Example 1

The following example shows how to create a DataFrame by passing a list of dictionaries.

Live Demo

import pandas as pd data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}] df = pd.DataFrame(data) print df

Its output is as follows −

a b c 0 1 2 NaN 1 5 10 20.0

Note − Observe, NaN (Not a Number) is appended in missing areas.

Example 2

The following example shows how to create a DataFrame by passing a list of dictionaries and the row indices.

Live Demo

import pandas as pd data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}] df = pd.DataFrame(data, index=['first', 'second']) print df

Its output is as follows −

a b c first 1 2 NaN second 5 10 20.0

Example 3

The following example shows how to create a DataFrame with a list of dictionaries, row indices, and column indices.

Live Demo

import pandas as pd data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}] #With two column indices, values same as dictionary keys df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b']) #With two column indices with one index with other name df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b1']) print df1 print df2

Its output is as follows −

#df1 output a b first 1 2 second 5 10 #df2 output a b1 first 1 NaN second 5 NaN

Note − Observe, df2 DataFrame is created with a column index other than the dictionary key; thus, appended the NaN’s in place. Whereas, df1 is created with column indices same as dictionary keys, so NaN’s appended.

Create a DataFrame from Dict of Series

Dictionary of Series can be passed to form a DataFrame. The resultant index is the union of all the series indexes passed.

Example

Live Demo

import pandas as pd d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d) print df

Its output is as follows −

one two a 1.0 1 b 2.0 2 c 3.0 3 d NaN 4

Note − Observe, for the series one, there is no label ‘d’ passed, but in the result, for the d label, NaN is appended with NaN.

Let us now understand column selection, addition, and deletion through examples.

Column Selection

We will understand this by selecting a column from the DataFrame.

Example

Live Demo

import pandas as pd d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d) print df ['one']

Its output is as follows −

a 1.0 b 2.0 c 3.0 d NaN Name: one, dtype: float64

Column Addition

We will understand this by adding a new column to an existing data frame.

Example

Live Demo

import pandas as pd d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d) # Adding a new column to an existing DataFrame object with column label by passing new series print ("Adding a new column by passing as Series:") df['three']=pd.Series([10,20,30],index=['a','b','c']) print df print ("Adding a new column using the existing columns in DataFrame:") df['four']=df['one']+df['three'] print df

Its output is as follows −

Adding a new column by passing as Series: one two three a 1.0 1 10.0 b 2.0 2 20.0 c 3.0 3 30.0 d NaN 4 NaN Adding a new column using the existing columns in DataFrame: one two three four a 1.0 1 10.0 11.0 b 2.0 2 20.0 22.0 c 3.0 3 30.0 33.0 d NaN 4 NaN NaN

Column Deletion

Columns can be deleted or popped; let us take an example to understand how.

Example

Live Demo

# Using the previous DataFrame, we will delete a column # using del function import pandas as pd d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']), 'three' : pd.Series([10,20,30], index=['a','b','c'])} df = pd.DataFrame(d) print ("Our dataframe is:") print df # using del function print ("Deleting the first column using DEL function:") del df['one'] print df # using pop function print ("Deleting another column using POP function:") df.pop('two') print df

Its output is as follows −

Our dataframe is: one three two a 1.0 10.0 1 b 2.0 20.0 2 c 3.0 30.0 3 d NaN NaN 4 Deleting the first column using DEL function: three two a 10.0 1 b 20.0 2 c 30.0 3 d NaN 4 Deleting another column using POP function: three a 10.0 b 20.0 c 30.0 d NaN

Row Selection, Addition, and Deletion

We will now understand row selection, addition and deletion through examples. Let us begin with the concept of selection.

Selection by Label

Rows can be selected by passing row label to a loc function.

Live Demo

import pandas as pd d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d) print df.loc['b']

Its output is as follows −

one 2.0 two 2.0 Name: b, dtype: float64

The result is a series with labels as column names of the DataFrame. And, the Name of the series is the label with which it is retrieved.

Selection by integer location

Rows can be selected by passing integer location to an iloc function.

Live Demo

import pandas as pd d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d) print df.iloc[2]

Its output is as follows −

one 3.0 two 3.0 Name: c, dtype: float64

Slice Rows

Multiple rows can be selected using ‘ : ’ operator.

Live Demo

import pandas as pd d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d) print df[2:4]

Its output is as follows −

one two c 3.0 3 d NaN 4

Addition of Rows

Add new rows to a DataFrame using the append function. This function will append the rows at the end.

Live Demo

import pandas as pd df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b']) df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b']) df = df.append(df2) print df

Its output is as follows −

a b 0 1 2 1 3 4 0 5 6 1 7 8

Deletion of Rows

Use index label to delete or drop rows from a DataFrame. If label is duplicated, then multiple rows will be dropped.

If you observe, in the above example, the labels are duplicate. Let us drop a label and will see how many rows will get dropped.

Live Demo

import pandas as pd df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b']) df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b']) df = df.append(df2) # Drop rows with label 0 df = df.drop(0) print df

Its output is as follows −

a b 1 3 4 1 7 8

In the above example, two rows were dropped because those two contain the same label 0.

Sours: https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm

Dataframe pd

.

.

You will also be interested:

.



2637 2638 2639 2640 2641