依赖
<dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.13.1</version> </dependency>
使用示例
String html = "<div><p>this is a text</p></div>"; // 解析字符串 Document doc = Jsoup.parse(html); System.out.println(doc); // 解析Html片段 Document fragment = Jsoup.parseBodyFragment(html); System.out.println(fragment); // 从Url解析 Document doc = Jsoup.connect("https://www.baidu.com/").get(); System.out.println(doc.title()); // 使用css解析器 Element element = doc.selectFirst("title"); System.out.println(element.text());
参考
Java爬虫利器HTML解析工具-Jsoup
https://www.open-open.com/jsoup/parsing-a-document.htm