java通过Jsoup爬取网页过程详解
作者:蜀山鸭梨大 时间:2021-12-20 03:24:10
这篇文章主要介绍了java通过Jsoup爬取网页过程详解,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下
一,导入依赖
<!--java爬虫-->
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.10.3</version>
</dependency>
<!--httpclient依赖-->
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
</dependency>
二,编写demo类
注意不要导错包了,是org.jsoup.nodes下面的
package com.taotao.entity;
import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
/**
* Author: TaoTao 2019/9/26
*/
public class intefaceTest {
public static void main(String[] args) throws IOException {
CloseableHttpClient httpClient = HttpClients.createDefault();//创建httpClient
HttpGet httpGet = new HttpGet("http://www.cnblogs.com/");//创建httpget实例
CloseableHttpResponse response = httpClient.execute(httpGet);//执行get请求
HttpEntity entity = response.getEntity();//获取返回实体
String content = EntityUtils.toString(entity,"utf-8");//网页内容
response.close();//关闭流和释放系统资源
Jsoup.parse(content);
Document doc = Jsoup.parse(content);//解析网页得到文档对象
Elements elements = doc.getElementsByTag("title");//获取tag是title的所有dom文档
Element element = elements.get(0);//获取第一个元素
String title = element.text(); //.html是返回html
System.out.println("网页标题:"+title);
Element element1 = doc.getElementById("site_nav_top");//获取id=site_nav_top标签
String str = element1.text();
System.out.println("str:"+str);
}
}
来源:https://www.cnblogs.com/book-mountain/p/11595018.html
标签:java,jsoup,爬取,网页
![](/images/zang.png)
![](/images/jiucuo.png)
猜你喜欢
Android程序开发之手机APP创建桌面快捷方式
2023-04-01 06:55:49
![](https://img.aspxhome.com/file/2023/0/89540_0s.png)
SpringBoot2.0 ZipKin示例代码
2022-11-25 00:24:40
![](https://img.aspxhome.com/file/2023/9/70989_0s.png)
unity实现手游虚拟摇杆
2021-11-23 07:16:44
史上最全的java随机数生成算法分享
2023-10-17 15:22:33
![](https://img.aspxhome.com/file/2023/9/62759_0s.jpg)
浅析MMAP零拷贝在RocketMQ中的运用
2021-11-21 01:59:47
![](https://img.aspxhome.com/file/2023/1/79301_0s.png)
Java实现克隆的三种方式实例总结
2021-11-21 15:26:14
![](https://img.aspxhome.com/file/2023/2/76632_0s.png)
Intellij Idea修改代码方法参数自动提示快捷键的操作
2022-11-19 08:08:37
![](https://img.aspxhome.com/file/2023/8/75748_0s.jpg)
Spring Validation方法实现原理分析
2023-09-04 17:11:55
![](https://img.aspxhome.com/file/2023/7/81257_0s.jpg)
Spring security权限配置与使用大全
2022-03-05 15:37:21
![](https://img.aspxhome.com/file/2023/0/80900_0s.png)
springboot webflux 过滤器(使用RouterFunction实现)
2022-12-12 21:28:44
java之swing下拉菜单实现方法
2023-07-12 04:55:30
解决BigDecimal转long丢失精度的问题
2022-07-16 13:44:22
![](https://img.aspxhome.com/file/2023/2/83582_0s.jpg)
Java实现最小生成树算法详解
2023-11-25 04:51:22
![](https://img.aspxhome.com/file/2023/8/59678_0s.png)
Winform之TextBox输入日期格式验证yyyy-mm-dd
2023-04-14 21:35:05
Java List分页功能实现代码实例
2022-06-02 13:56:14
Java线程并发中常见的锁机制详细介绍
2023-07-04 05:33:33
![](https://img.aspxhome.com/file/2023/9/70339_0s.png)
Android自定义控件ListView下拉刷新的代码
2023-04-07 23:51:06
![](https://img.aspxhome.com/file/2023/9/97859_0s.png)
java LRU(Least Recently Used )详解及实例代码
2022-10-08 10:42:43
利用Distinct()内置方法对List集合的去重问题详解
2023-01-31 00:45:30
C#线程间通信的异步机制
2023-10-13 16:45:23
![](https://img.aspxhome.com/file/2023/9/82149_0s.jpg)