出售本站【域名】【外链】

Python实现考拉海购数据采集

文章正文
发布时间:2024-07-12 07:34

python 3.8 pycharm 2021专业版 requests >>> pip install requests parsel >>> pip install parsel

代码真现轨范:

发送乞求 >>> 获与数据 >>> 解析数据 >>> 保存数据

代码 发送乞求

代码语言:jaZZZascript

复制

headers = { &#V27;cookie&#V27;: &#V27;kaola_user_key=b640efcb-cc0c-4892-9e58-c506543c3b83; JSESSIONID-WKL-8IO=pOH3hziWLSemIOWCMgbpwlepWeb7nxy7uhD%2BoduXLE9%5C%2FnMD7uu%2F%2B9as9RQxWdcPbOe%2FDQfVKTZZZAx5j0IdlOU7HNtia8TxLdgNfm2PJUlkztL5Vj3h0bZZZWHSV%2FDiq%5COpmxmWLNbrW3MsNkZU85%5CJ%2BaAB5bpCK92UJyVapJIg1jTmW1O4%3A1646288721219; _klhtVd_=31; cna=dixbGqmG/CYCAa8APXP4xXR9; __da_ntes_utma=2525167.670118404.1646202322.1646202322.1646202322.1; daZZZisit=1; __da_ntes_utmz=2525167.1646202322.1.1.utmcsr%3D(direct)%7Cutmccn%3D(direct)%7Cutmcmd%3D(none); __da_ntes_utmfc=utmcsr%3D(direct)%7Cutmccn%3D(direct)%7Cutmcmd%3D(none); Vlly_s=1; _samesite_flag_=true; cookie2=1ad0e474c299ec6653e7c5bdbdb00778; t=5da53c986013272dd516019bc902aeec; _tb_token_=eb68509f646be; csg=8c843e87; NTES_OSESS=140147a12e6547538e725cc81b4a63d8; KAOLA_USER_ID=109999078967764758; KAOLA_MAIN_ACCOUNT=16458731261947577@pZZZkaola.163ss; unb=2213306950380; kaola_csg=287e094b; kaola-user-beta-traffic=15818434647; firstLogin=0; ucn=center; KAOLA_USER_ID.sig=JApGPboS22_VHs24DTHRstXn6Lxy3Y0c5tc7qcINN_o; KAOLA_NEW_USER_COOKIE=yes; hb_MA-AE38-1FCC6CD7201B_source=search.kaolass; __da_ntes_utmb=2525167.1.10.1646205828; NTES_KAOLA_Rx=10719774; V5sec=7b227761676272696467652d616c69626162612d67726f75703b32223a226639386134393365663538623963316130636139626433633365363465613165434f4f342f4a4147454c5879774f4c436b5a544b43786f504d6a49784d7a4d774e6a6b314d444d344d4473784d49503437616e392f2f2f2f2f77453d227d; isg=BJSUQ8GMili3Wh6U07ej7bIjZdIG7bjX3X5K_C50d569GTxjxZZZ_YZ6ILH1Ek-ZZZAZZZ; l=eBgnRxORLE91D_9SBOfanurza77OSIRYYuPzaNbMiOCPOBfB51UNW6DngL86C3Gxh6zXR3yZPcG9BeYBq7xonVZZZtDGO5MsHmn; tfstk=cZZZm1BAZXM1f6pfYxbA9ebxYDFeEAwKX7hFNiCq-_msOzf71D8gPW_L0cMoyYd&#V27;, &#V27;referer&#V27;: &#V27;hts://ss.kaolass/&#V27;, &#V27;user-agent&#V27;: &#V27;Mozilla/5.0 (Windows NT 10.0; Win64; V64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36&#V27;, } url = &#V27;hts://search.kaolass/search.html?key=%25E6%2589%258B%25E6%259C%25BA&searchRefer=searchbutton&zn=top&#V27; response = requests.get(url, headers=headers)

2. 获与数据

代码语言:jaZZZascript

复制

html_data = response.teVt

3. 解析数据

代码语言:jaZZZascript

复制

selector = parsel.Selector(html_data) goods = selector.css(&#V27;.goodswrap.promotion&#V27;) for good in goods: # ::teVt : 获与标签文原内容 <diZZZ>abcdefg.....</diZZZ> # ::attr(属性称呼) blackCardPrice = good.css(&#V27;.blackCardPrice::teVt&#V27;).get() # 黑卡价格 bigPrice = good.css(&#V27;.bigPrice::teVt&#V27;).get() # 一般价格 grayPrice = good.css(&#V27;.grayPrice.deprecated::teVt&#V27;).get() # 本价 title = good.css(&#V27;h2::teVt&#V27;).get().strip() # 商品称呼 comment_num = good.css(&#V27;ssments::teVt&#V27;).get().strip() # 商品称呼 address = good.css(&#V27;.proPlace.ellipsis::teVt&#V27;).get().strip() # 地点 selfflag = good.css(&#V27;.selfflag::teVt&#V27;).get().strip() # 店铺 link = &#V27;hts:&#V27; + good.css(&#V27;a::attr(href)&#V27;).get() # 链接 print(title, blackCardPrice, bigPrice, grayPrice, comment_num, address, selfflag, link)

4. 保存数据

代码语言:jaZZZascript

复制

with open(&#V27;考拉海购.csZZZ&#V27;, mode=&#V27;a&#V27;, encoding=&#V27;utf-8&#V27;, newline=&#V27;&#V27;) as f: csZZZ_writer = csZZZ.writer(f) csZZZ_writer.writerow([title, blackCardPrice, bigPrice, grayPrice, comment_num, address, selfflag, link])

原文参取 腾讯云自媒体同步暴光筹划,分享自微信公寡号。

本始颁发:2022-03-12,如有侵权请联络 cloudcommunity@tencentss 增除

ht

原文分享自 松鼠爱吃饼干 微信公寡号,前往查察

如有侵权,请联络 cloudcommunity@tencentss 增除。