python使用pandas抽样训练数据中某个类别实例
作者:Yan456jie 时间:2021-03-02 02:56:14
废话真的一句也不想多说,直接看代码吧!
# -*- coding: utf-8 -*-
import numpy
from sklearn import metrics
from sklearn.svm import LinearSVC
from sklearn.naive_bayes import MultinomialNB
from sklearn import linear_model
from sklearn.datasets import load_iris
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn import cross_validation
from sklearn import preprocessing
import scipy as sp
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import SelectKBest ,chi2
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
#import iris_data
'''
creativeID,userID,positionID,clickTime,conversionTime,connectionType,
telecomsOperator,appPlatform,sitesetID,positionType,age,gender,
education,marriageStatus,haveBaby,hometown,residence,appID,appCategory,label
'''
def test():
df = pd.read_table("/var/lib/mysql-files/data1.csv", sep=",")
df1 = df[["connectionType","telecomsOperator","appPlatform","sitesetID",
"positionType","age","gender","education","marriageStatus",
"haveBaby","hometown","residence","appCategory","label"]]
print df1["label"].value_counts()
N_data = df1[df1["label"]==0]
P_data = df1[df1["label"]==1]
N_data = N_data.sample(n=P_data.shape[0], frac=None, replace=False, weights=None, random_state=2, axis=0)
#print df1.loc[:,"label"]==0
print P_data.shape
print N_data.shape
data = pd.concat([N_data,P_data])
print data.shape
data = data.sample(frac=1).reset_index(drop=True)
print data[["label"]]
return
补充拓展:pandas实现对dataframe抽样
随机抽样
import pandas as pd
#对dataframe随机抽取2000个样本
pd.sample(df, n=2000)
分层抽样
利用sklean中的函数灵活进行抽样
from sklearn.model_selection import train_test_split
#y是在X中的某一个属性列
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, stratify=y)
来源:https://blog.csdn.net/Yan456jie/article/details/72239395
标签:python,pandas,训练,数据类别
![](/images/zang.png)
![](/images/jiucuo.png)
猜你喜欢
实现Windows下设置定时任务来运行python脚本
2021-10-12 05:03:32
![](https://img.aspxhome.com/file/2023/7/76927_0s.png)
Golang实现断点续传功能
2023-07-24 08:19:26
![](https://img.aspxhome.com/file/2023/8/99058_0s.jpg)
自动生成sql语句
2008-05-09 12:42:00
为你总结一些php系统类函数
2023-11-15 02:22:35
关于大批量数据高效插入方法
2010-07-31 19:07:00
Jupyter notebook 启动闪退问题的解决
2023-11-17 21:08:58
![](https://img.aspxhome.com/file/2023/9/82689_0s.jpg)
python tkinter模块的简单使用
2021-11-20 16:52:33
![](https://img.aspxhome.com/file/2023/5/75325_0s.png)
原创一个AJAX类
2008-07-24 13:29:00
非常详细的IFRAME的属性参考手册
2008-02-12 12:45:00
SQL学习笔记二 创建表、插入数据的语句
2011-09-30 11:17:32
微信小程序picker组件简单用法示例
2023-07-23 10:49:32
![](https://img.aspxhome.com/file/2023/1/55921_0s.gif)
Python使用PyAudio制作录音工具的实现代码
2023-09-07 22:36:46
![](https://img.aspxhome.com/file/2023/8/67398_0s.png)
详细介绍Python进度条tqdm的使用
2022-12-13 19:45:08
![](https://img.aspxhome.com/file/2023/9/77619_0s.gif)
了解ASP的基本语法和变量
2008-01-16 13:03:00
Python中文字符串截取问题
2021-08-15 23:55:20
作为Web开发人员,我为什么喜欢Google Chrome浏览器
2011-08-29 15:37:47
Python爬取国外天气预报网站的方法
2022-02-22 00:39:07
MySQL常见错误提示及解决方法
2008-02-23 10:08:00
IE7下 filter:Alpha(opacity=xx) 的小问题
2008-12-02 16:24:00
utf8_unicode_ci与utf8_general_ci的区别
2010-03-03 15:54:00