<img alt="Image for post" src="https://miro.medium.com/max/11570/0*aHKnUhdWdBriqfj_" />

Photo by <a target="_blank" href="https://unsplash.com/@thetechnomaid?utm_source=medium&amp;utm_medium=referral">Sophie Elvis</a> on <a target="_blank" href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a>
Data has to be manipulated and cleaned so that it can provide useful insights. Data manipulation is a necessity as there is an increasing amount of data being stored and used.
This article explains some of the data manipulation operations that can help with organizing our data and extracting useful insights.
Pandas, as explained <a target="_blank" href="https://medium.com/better-programming/top-10-python-libraries-for-data-science-21e6cd95ca55">here</a>, is an open-source python library that implements easy, efficient, high-performance data analysis tools. Pandas provide efficient access to data <a target="_blank" href="https://medium.com/better-programming/data-wrangling-with-pandas-57f7f72fe73c">wrangling</a>/munging tasks that occupy almost 80 percent of a data scientist’s time. There are different ways to store data for analysis: rectangular data or tabular data containing rows and columns is the most common form.
Tabular data is represented as a Dataframe object in pandas. Every value within a column of the Dataframe has the same data type, either text or numeric but different columns can contain different data types. Dataframes can be created in various ways, like passing in a dictionary, list of lists, reading from a flat-file such as CSV.
<img src="https://miro.medium.com/max/60/1*1V9E9_0qb526aFR43Xi-VA.png?q=20" alt="Image for post" />
<img alt="Image for post" src="https://miro.medium.com/max/1220/1*1V9E9_0qb526aFR43Xi-VA.png" />

How to Install, import pandas, and explore the data has been shown <a target="_blank" href="https://medium.com/better-programming/data-wrangling-with-pandas-57f7f72fe73c">here</a>.

Sorting
Sorting is one of the two most important ways to find interesting parts in the Dataframes. <code>sort_values()</code>sorts rows. When the column name is passed into the method, the data by default gets sorted in ascending order. <code>ascending</code> is set to <code>False</code> to sort in descending order. When a list of columns is passed to the <code>sort_values()</code>method to sort rows, <code>ascending</code>is set to a list of booleans corresponding to the number of the columns to sort in different orders.
<img src="https://miro.medium.com/max/60/1*bHYGc8cAjhBKF5A0hfvx_g.png?q=20" alt="Image for post" />
<img alt="Image for post" src="https://miro.medium.com/max/1532/1*bHYGc8cAjhBKF5A0hfvx_g.png" />


Subsetting
A large part of data science is about finding which interesting bits in your dataset. Simple techniques, sometimes known as filtering or selecting rows, are used to find a subset of rows that match some criteria. We can filter single, multiple columns, and text data.
There are many ways to subset a DataFrame: the most common is using relational operators to return <code>True</code>or <code>False</code> for each row, then passing them into square brackets.
<img src="https://miro.medium.com/max/60/1*pUaMZF3dbfOlmakHrm75BA.png?q=20" alt="Image for post" />
<img alt="Image for post" src="https://miro.medium.com/max/1806/1*pUaMZF3dbfOlmakHrm75BA.png" />

we can subset rows by creating a logical condition to filter against, the result is a column of booleans
<img src="https://miro.medium.com/max/60/1*WKwwQDJM875GBhoLW14-tg.png?q=20" alt="Image for post" />
<img alt="Image for post" src="https://miro.medium.com/max/1962/1*WKwwQDJM875GBhoLW14-tg.png" />

<img src="https://miro.medium.com/max/60/1*Do4ByNJi7hx_GIiP1lECtA.png?q=20" alt="Image for post" />
<img alt="Image for post" src="https://miro.medium.com/max/1172/1*Do4ByNJi7hx_GIiP1lECtA.png" />

we can filter on multiple conditions by using logical operators, the bitwise ‘and’/ampersand(&amp;) and ‘or’/pipe (|)
<img src="https://miro.medium.com/max/60/1*5GoyhpS5x5qqrlL18jBMHw.png?q=20" alt="Image for post" />
<img alt="Image for post" src="https://miro.medium.com/max/1386/1*5GoyhpS5x5qqrlL18jBMHw.png" />


New columns
We might need to create a new column from the existing columns. Creating a new column can also be called mutating a Dataframe, transforming a Dataframe, and feature engineering.
<img src="https://miro.medium.com/max/60/1*npjfBo4YQAq6Gm8OYhqDrQ.png?q=20" alt="Image for post" />
<img alt="Image for post" src="https://miro.medium.com/max/1824/1*npjfBo4YQAq6Gm8OYhqDrQ.png" />

<img src="https://miro.medium.com/max/60/1*Ja2e2xqlee89GeRfMlj1Ug.jpeg?q=20" alt="Image for post" />
<img alt="Image for post" src="https://miro.medium.com/max/1902/1*Ja2e2xqlee89GeRfMlj1Ug.jpeg" />

From our data,we can confirm by checking the shape of the new column. The number of columns increased by one from 12 to 13.
<img src="https://miro.medium.com/max/60/1*GHFEt4ASv8ocmWlK0r8J3Q.png?q=20" alt="Image for post" />
<img alt="Image for post" src="https://miro.medium.com/max/850/1*GHFEt4ASv8ocmWlK0r8J3Q.png" />

<img src="https://miro.medium.com/max/60/1*voGWcz6u4ZRaQl9iuWyMlQ.png?q=20" alt="Image for post" />
<img alt="Image for post" src="https://miro.medium.com/max/1322/1*voGWcz6u4ZRaQl9iuWyMlQ.png" />


We have seen the four most common types of data manipulation: sorting rows, subsetting columns, subsetting rows, and adding new columns. What other data manipulations operations do you know?

OLUFUNMILAYO RUTH AFORIJIKU's Blog

OLUFUNMILAYO RUTH AFORIJIKU's Blog

Data Manipulation with python pandas