pandas的Series类型与基本操作详解

作者:kingov 时间:2021-03-23 12:06:36 

1 Series

线性的数据结构, series是一个一维数组

Pandas 会默然用0到n-1来作为series的index, 但也可以自己指定index( 可以把index理解为dict里面的key )

1.1创造一个serise数据


import pandas as pd
import numpy as np
s = pd.Series([9, 'zheng', 'beijing', 128])
print(s)

打印

0 9
1 zheng
2 beijing
3 128
dtype: object

访问其中某个数据


print(s[1:2])

# 打印
1 zheng
dtype: object

Series类型的基本操作:

Series类型包括index和values两部分


In [14]: a = pd.Series({'a':1,'b':5})

In [15]: a.index
Out[15]: Index(['a', 'b'], dtype='object')

In [16]: a.values #返回一个多维数组numpy对象
Out[16]: array([1, 5], dtype=int64)

Series类型的操作类似ndarray类型


#自动索引和自定义索引并存,但不能混用
In [17]: a[0] #自动索引
Out[17]: 1
#自定义索引
In [18]: a['a']
Out[18]: 1
#不能混用
In [20]: a[['a',1]]
Out[20]:
a 1.0
1 NaN
dtype: float64

Series类型的操作类似Python字典类型


#通过自定义索引访问
#对索引保留字in操作,值不可以
In [21]: 'a' in a
Out[21]: True

In [22]: 1 in a
Out[22]: False

Series类型在运算中会自动对齐不同索引的数据


In [29]: a = pd.Series([1,3,5],index = ['a','b','c'])

In [30]: b = pd.Series([2,4,5,6],index = ['c,','d','e','b'])

In [31]: a+b
Out[31]:
a  NaN
b  9.0
c  NaN
c, NaN
d  NaN
e  NaN
dtype: float64

Series对象可以随时修改并即刻生效


In [32]: a.index = ['c','d','e']

In [33]: a
Out[33]:
c 1
d 3
e 5
dtype: int64

In [34]: a+b
Out[34]:
b  NaN
c  NaN
c,  NaN
d  7.0
e  10.0
dtype: float64

1.2 指定index


import pandas as pd
import numpy as np
s = pd.Series([9, 'zheng', 'beijing', 128, 'usa', 990], index=[1,2,3,'e','f','g'])
print(s)

打印

1 9
2 zheng
3 beijing
e 128
f usa
g 990
dtype: object

根据索引找出值


print(s['f']) # usa

1.3 用dictionary构造一个series


import pandas as pd
import numpy as np
s = {"ton": 20, "mary": 18, "jack": 19, "car": None}
sa = pd.Series(s, name="age")
print(sa)

打印

car NaN
jack 19.0
mary 18.0
ton 20.0
Name: age, dtype: float64

检测类型


print(type(sa)) # <class 'pandas.core.series.Series'>

1.4 用numpy ndarray构造一个Series

生成一个随机数


import pandas as pd
import numpy as np

num_abc = pd.Series(np.random.randn(5), index=list('abcde'))
num = pd.Series(np.random.randn(5))

print(num)
print(num_abc)

# 打印
0   -0.102860
1   -1.138242
2    1.408063
3   -0.893559
4    1.378845
dtype: float64
a   -0.658398
b    1.568236
c    0.535451
d    0.103117
e   -1.556231
dtype: float64

1.5 选择数据


import pandas as pd
import numpy as np

s = pd.Series([9, 'zheng', 'beijing', 128, 'usa', 990], index=[1,2,3,'e','f','g'])

print(s[1:3])  # 选择第1到3个, 包左不包右 zheng beijing
print(s[[1,3]])  # 选择第1个和第3个, zheng 128
print(s[:-1]) # 选择第1个到倒数第1个, 9 zheng beijing 128 usa

1.6 操作数据


import pandas as pd
import numpy as np
s = pd.Series([9, 'zheng', 'beijing', 128, 'usa', 990], index=[1,2,3,'e','f','g'])

sum = s[1:3] + s[1:3]
sum1 = s[1:4] + s[1:4]
sum2 = s[1:3] + s[1:4]
sum3 = s[:3] + s[1:]
print(sum)
print(sum1)
print(sum2)
print(sum3)

打印

2        zhengzheng
3    beijingbeijing
dtype: object
2        zhengzheng
3    beijingbeijing
e               256
dtype: object
2        zhengzheng
3    beijingbeijing
e               NaN
dtype: object
1               NaN
2        zhengzheng
3    beijingbeijing
e               NaN
f               NaN
g               NaN
dtype: object

1.7 查找

是否存在


USA in s # true

范围查找


import pandas as pd
import numpy as np

s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": None}

sa = pd.Series(s, name="age")

print(sa[sa>19])

pandas的Series类型与基本操作详解

中位数


import pandas as pd
import numpy as np

s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": None}

sa = pd.Series(s, name="age")

print(sa.median()) # 20

判断是否大于中位数


import pandas as pd
import numpy as np

s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": None}

sa = pd.Series(s, name="age")

print(sa>sa.median())

pandas的Series类型与基本操作详解

找出大于中位数的数


import pandas as pd
import numpy as np

s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": None}

sa = pd.Series(s, name="age")

print(sa[sa > sa.median()])

pandas的Series类型与基本操作详解

中位数


import pandas as pd
import numpy as np

s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": None}

sa = pd.Series(s, name="age")

more_than_midian = sa>sa.median()

print(more_than_midian)

print('---------------------')

print(sa[more_than_midian])

pandas的Series类型与基本操作详解

1.8 Series赋值


import pandas as pd
import numpy as np

s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": None}

sa = pd.Series(s, name="age")

print(s)

print('----------------')

sa['ton'] = 99

print(sa)

pandas的Series类型与基本操作详解

1.9 满足条件的统一赋值


import pandas as pd
import numpy as np

s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": None}

sa = pd.Series(s, name="age")

print(s) # 打印原字典

print('---------------------') # 分割线

sa[sa>19] = 88 # 将所有大于19的同一改为88

print(sa) # 打印更改之后的数据

print('---------------------') # 分割线

print(sa / 2) # 将所有数据除以2

pandas的Series类型与基本操作详解

来源:https://blog.csdn.net/kingov/article/details/79513322

标签:pandas,Series
0
投稿

猜你喜欢

  • vue-music关于Player播放器组件详解

    2024-04-28 09:26:11
  • 如何将服务器端变量转换为客户端的变量?

    2009-12-03 19:54:00
  • 超简单使用Python换脸实例

    2021-07-31 17:03:36
  • 最近写的一个asp缓存函数

    2008-11-25 14:07:00
  • 压缩包密码破解示例分享(类似典破解)

    2022-10-17 17:03:37
  • python将多个文本文件合并为一个文本的代码(便于搜索)

    2021-10-23 07:21:27
  • mysql清除log-bin日志的方法

    2024-01-15 08:22:30
  • viper配置框架的介绍支持zookeeper的读取和监听

    2024-04-27 15:32:12
  • Python OpenCV 针对图像细节的不同操作技巧

    2021-08-13 01:29:22
  • javascript实现瀑布流自适应遇到的问题及解决方案

    2024-04-16 10:35:23
  • Python常见异常的处理方式浅析

    2022-05-14 17:32:37
  • 千分位数字格式化(用逗号隔开 代码已做了修改 支持0-9位逗号隔开)的JS代码

    2023-08-19 20:33:57
  • pytorch模型预测结果与ndarray互转方式

    2023-12-06 02:35:11
  • 三分钟python搭建支付宝三方支付

    2022-02-03 22:58:42
  • Mac Homebrew安装的MySQL无法远程登录的解决

    2024-01-14 12:12:56
  • 详解pandas.DataFrame中删除包涵特定字符串所在的行

    2023-08-23 23:37:45
  • Python函数装饰器的使用教程

    2022-10-26 03:01:06
  • python使用matplotlib绘制折线图的示例代码

    2021-06-25 22:51:18
  • ASP 高亮显示不区分大小写的关键字

    2011-03-09 10:45:00
  • Flask框架debug与配置项的开启与设置详解

    2022-04-11 17:05:58
  • asp之家 网络编程 m.aspxhome.com