We’re rewarding the question askers & reputations are being recalculated! Read more.

Questions tagged [pandas]

Pandas is a Python library for data manipulation and analysis, e.g. dataframes, multidimensional time series and cross-sectional datasets commonly found in statistics, experimental science results, econometrics, or finance. Pandas is one of the main data-science libraries in Python.

Filter by
Sorted by
Tagged with
-3
votes
0answers
13 views

How to group values in pandas dataframe

I want to group values in a dataframe and assign those grouped values as class in new column(Group). Here's what I mean: Input File Name Page Number Content Font Class Font Size Top Left Width ...
0
votes
0answers
24 views

How can I save the results from the python script below?

the following script runs just fine. The problem is that with every loop, the results populated into saka_keja.csv keep on being appended, instead of being saved. A scenario, if I have received 10 ...
0
votes
1answer
18 views

Creating a empty DataFrame as default parameter

I am trying to create a python function that plots the data from a DataFrame. The parameters should either be just the data. Or the data and the standard deviation. As a default parameter for the ...
0
votes
0answers
45 views

Extracting digits from string column

I have multiple columns names titles, I would like to extract a 6 digit figure from each of these columns where such a figure exists and place those digits in a new column names global_id. Some titles ...
0
votes
0answers
11 views

Reading pandas DataFrame with user defined indices

I have a Pandas DataFrame which needs to be processed in chunks as the file is big to be loaded entirely in the RAM. For this, I used the following code: chunk = pd.read_csv("file_name.csv", ...
0
votes
2answers
19 views

Proper way to merge data in Pandas

I currently have 2 dataframes and would like to merge them into one. But they have common fields. Like the one below: import pandas as pd import numpy as np df1 = pd.DataFrame({'A': [1,2,3], 'B': [4,...
0
votes
0answers
11 views

Pandas: Adding data from multiple rows into extra columns seperated by a 3rd column

Yesterday I got some help on how to add data from multiple rows into extra columns Pandas: Adding data from multiple rows into extra columns for a single row). That benefited me a lot, but I want to ...
1
vote
2answers
16 views

Fill up missing observations at dates in interval in a pandas dataframe

Let us say I have the following pandas dataframe: +---------------------+---------+-------+-----+ | observed_cats_count | year | month | day | +---------------------+---------+-------+-----+ | ...
0
votes
0answers
20 views

Merging two dataframes on multiple columns where entries in columns are mixed up

The problem at hand is as follows: I have two CSV-files which are imported as dataframe. The first one contains odds movements for multiple tennis matches and the second one the outcome (winner) of ...
0
votes
1answer
36 views

Is there a numpy equivalent to pandas.apply?

I have a call which adds some random values to a pandas Series: series = series.apply(lambda x: int(math.ceil(x + x * rand_value(range)))) For performance reasons I can't use the pandas.Series ...
1
vote
1answer
28 views

MemoryError: Unable to allocate array with shape (118, 840983) and data type float64

I'm getting the following error: MemoryError: Unable to allocate array with shape (118, 840983) and data type float64 in my python code whenever I am running a python pandas.readcsv() function to read ...
0
votes
0answers
18 views

How to solve __array_function__ internals source error?

I'have a problem and can't figure out where the bug is. While running mt script I get as error : Could not load source '<array_function internals>': Source unavailable. The point of my ...
1
vote
2answers
18 views

Difference between pd.concat() and pd.merge() and why do I get wrong output?

I am facing a difficulty with two dataframes I need to join. I usually apply pd.merge(). but in this case I get a ValueError and I am recommended to use pd.concat(). So, my case is this: I have two ...
0
votes
1answer
25 views

how to read excel file with nested columns with pandas?

I am trying to read an Excel file using pandas but my columns and index are changed: df = pd.read_excel('Assignment.xlsx',sheet_name='Assignment',index_col=0) Excel file: Jupyter notebook:
0
votes
0answers
11 views

Keeping cumulative sums for each ID in pandas dataframe [duplicate]

I have a pandas dataframe set out like this: DATE ID VALUE 10/4 11 0.3 11/4 13 0.5 12/4 11 0.2 13/4 13 0.1 14/4 16 0.2 15/4 13 0.8 16/4 16 ...
0
votes
1answer
23 views

how to classify my dataframe based on the genres i want using pandas

I have a .csv data file with different genres of games. There are games with 3 or 4 genre tags to it in the .csv file. How do I extract the rows which have only 2 of the 4 genre parameters? I want to ...
0
votes
0answers
4 views

Apache, mod_wsgi, and Pandas import error, unable to import dependencies Numpy due to error

I'm able to deploy the flask app without the Pandas library, so I don't think something is wrong with how I set up the server. Additionally, I'm able to run my flask app with pandas with Flask's ...
0
votes
2answers
23 views

convert date to week and count the dependencies from different columns of a dataframe

I have a dataframe like this: date Company Email 2019-10-07 abc [email protected] 2019-10-07 def [email protected] 2019-10-07 abc [email protected] 2019-10-08 ...
0
votes
1answer
19 views

Can't Install Python Pandas in 3.6.6

I am Mac user and trying to install Pandas in Python 3.6.6. to use in IDLE / VS Code to do my work. Hasans-MacBook-Pro:~ hasan-macbookpro$ python3 --version Python 3.6.6 Hasans-MacBook-Pro:~ hasan-...
0
votes
1answer
27 views

Why do we need to reshape (R,1) to (R,) for plotting?

I am trying to plot 3d plot. fig = plt.figure() ax = fig.add_subplot(111, projection='3d') ax.plot(xs,zs,targets) ax.set_xlabel('xs') ax.set_ylabel('zs') ax.set_zlabel('Targets') ax.view_init(azim=...
0
votes
3answers
34 views

Compare a column in one dataframe with two other columns in a different dataframe?

I have created two data frames from two tsv files. The data frames are as follows: Dataframe1 (df1) chr position 5 745 7 963 8 1024 Dataframe2 (df2) chr start end 1 10 ...
2
votes
4answers
40 views

What is the use of reset_index() in pandas?

While reading this article, I came across this statement. order_total = df.groupby('order')["ext price"].sum().rename("Order_Total").reset_index() Other than reset_index() method call, everything ...
0
votes
1answer
24 views

Getting value error of “The truth value of a Series is ambiguous..” while checking for equality between sub-strings in a string

I'm trying to check if a specific "Route" string starts and ends with the same sub-string but I'm getting an error of "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a....
0
votes
1answer
36 views

Python: float() argument must be a string or a number, not 'Period'

Have the following piece of code through which I am trying to plot a graph: df: date qty 0 2016-01-01 21.523810 1 2016-02-01 20.476190 2 2016-03-01 20.523810 3 2016-04-01 26....
0
votes
0answers
12 views

Pivot table - using python, pandas [duplicate]

I have been trying to convert the following input data to the below given expected outcome using python pandas pivot table but could not do it. Could you please help? Input data: NAME USERID B ...
0
votes
1answer
30 views

Append pandas dataframe to existing table in databricks

I want to append a pandas dataframe (8 columns) to an existing table in databricks (12 columns), and fill the other 4 columns that can't be matched with None values. Here is I've tried: spark_df = ...
0
votes
2answers
41 views

Use a loop on a dataframe by taking a specific column

I am new to pandas and python. Here I have a data-frame, DID feature 0 1 0 1 0 2 0 22 0 22 0 33 1 11 1 13 1 14 1 2 1 33 2 1 2 22 ...
0
votes
1answer
9 views

z-order of plot in matplotlib [duplicate]

I'm using matplotlib.axes.Axes.twinx to have a shared x-axis in matplotlib for both . I am being unable to order the plots using zorder. What I want is to plot the line graphs with ax1 to be on the ...
0
votes
0answers
11 views

Create new column using vlookup equivalent in python [duplicate]

I'm trying to mimic excel's vlookup function with 2 dataframes. I am keen to populate the "Department" column in df1 using the values in df2's "Org" column. The matching can be done from the "Job ...
1
vote
1answer
17 views

Convert Daily Dataframe with Multi Index to quarterly

I would like to convert my daily dataframe of stock data to a quarterly one. However, using resample did not work, because I have a multi index, so I would like my final quarterly dataframe to still ...
0
votes
0answers
21 views

Adjust Datetime column on x-axis of matplotlib in time series dataset

I'm working with time-series dataset, this dataset has a timestemp column which I set as the index column of the dataset, it has the vale of date and time also in the form as: 15/5/17 7:51 so when I ...
0
votes
1answer
23 views

Pandas stacked bar chart went wrong

I tried to change from normal bar chart to stacked bar chart but there's something wrong with the result. Data: Total Monthly Actual Hours Total Monthly Work Hours Activity Month ...
0
votes
3answers
26 views

Returning DataFrames headers after features selection using Pandas in python

I used Pandas to read the dataset from (.csv) file. The size of the orignal_data is (3185,158) as follows: original _data = pd.read_csv("file path") print("Original_data_shape:", original _data.shape)...
0
votes
2answers
20 views

how to do select the rows with same value across columns in pandas?

I have a df with 9 columns. Each column has values 0,1. 1 -means outlier. It's outliers according to 9 different algorithms. I want to select those true outliers, the following query does work. ...
2
votes
3answers
36 views

Pandas if statement based on matching two different column values

How would I do this: I want to add a new column that tells me if for the same id they have both 'blue' and 'green' associated with it. df = pd.DataFrame({ 'id' : ['a1', 'a1', 'b1', 'b1', 'c1'], ...
1
vote
0answers
17 views

Python Outlier Detection in Customer Sales (preferbly by pyod)

I have a big Sales transaction dataset that has multiple customers and each customer buys multiple products. I am struggling to find outliers for each customer and all products at the same time. ...
0
votes
0answers
33 views

My text classifier model doens't improve with multiple classes

I'm trying to train a model for a text classification and the model take a list of maximum 300 integer embedded from articles. The model trains without problem and all but the accuracy won't go up. ...
-1
votes
0answers
21 views

Improve pandas performance

I want to calculate the average age of the items in inside the B column. Some items can be found in A column but some are not. The current code I have now is like this from functools import lru_cache ...
-1
votes
1answer
35 views

Python syntax: How to declare the slice label rather than in place

import numpy as np import pandas as pd df = pd.DataFrame(np.random.randn(4, 2), columns=['c1', 'c2'], index=list('abcd')) # valid slice label df.loc['a':'d'] # invalid syntax labels = 'a':'d'; df.loc[...
-3
votes
0answers
39 views

Dropping a column based on a condition

How do I drop column from a dataframe if another column is missing using python? How should I structure my if, elif, else statement? I want to drop GDP growth rate as well if GDP is chosen and vice ...
1
vote
0answers
25 views

Pandas todense() function causing MemoryError - Python 3.x

I am attempting to train a number of classifiers to test their performance with classifying tweets as being from a political bot, or not (a binary 0 or 1 classifier). My data is being read in via a ...
0
votes
2answers
16 views

Interpolate data in pandas

I have a dataframa of the form: t1 x1 t2 NaN t3 NaN t4 x4 t5 x5 t6 NaN t7 x7 and so on. I want to interpolate the data in the second column, using the first column. So, the number that would be x2 ...
0
votes
1answer
43 views

Data Cleaning (Addresses) Python

I'm looking to clean a dataset with 61k rows. I need to clean its street address column. Presently, the addresses are a nightmare. Sometimes full addresses are written out (i.e. 111 Frederick Douglass ...
0
votes
2answers
27 views

convert Timestamps to month-year

How do I convert the timestamps to month-year.i.e. 9/5/2019 to Sep-2019. I tried this but I'm getting array. here is what i did. pd.DatetimeIndex(data['Timestamp']).year pd.DatetimeIndex(data['...
0
votes
0answers
16 views

pandas dataframe has date stored as datetime64[ns] but putting unique values into a list give a cryptic long integer

let's say when i type df.dtypes it returns Name int64 UpdateTime datetime64[ns] EnquiryDate datetime64[ns] dtype: object ls_UpdateDate = df['UpdateTime'].sort_values()....
2
votes
2answers
34 views

perform math operations with a group of data in python

given a list of numbers Eg: data = [30.5, 31.01, 30.4, 30.01, 29.5, 29.6, 29.63, 30.5, 30.33, 30.2] I need to create a function which subtracts the second element from the first element, the third ...
1
vote
2answers
32 views

How to drop rows that have duplicate pairs of values in two columns?

I currently have a Pandas DataFrame and would like to remove rows that have duplicate pairs in two columns. Here's an example displaying what I mean: col0 col1 col2 0 0 1 0 1 ...
0
votes
1answer
16 views

How to make a for loop in python for ttest_ind

When I try to make the for loop to get ttest results I get TypeError: unsupported operand type(s) for /: 'str' and 'int' But I have no idea what could be wrong with it. When I replace col with a ...
0
votes
0answers
30 views

Join Two Data frames- Python pandas

I have two Excel files which I loaded into dataframes: In first Frame I have States State1,2,3... as column names: In second Frame I have State1,2,3... as column values I need to merge these two ...
0
votes
2answers
37 views

How to convert a list into a Pandas Dataframe ? pd.DataFrame does not work on this list structure

I have the following data : Sample data: pd.DataFrame({'Candidate_id': pd.Series([533334, 533334, 533334, 533334, 533334],dtype='int64',index=pd.RangeIndex(start=0, stop=5, step=1)), 'SkillMatch': ...