# xsoup
**Repository Path**: codingfd/xsoup
## Basic Information
- **Project Name**: xsoup
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-03-27
- **Last Updated**: 2020-12-19
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
Xsoup
----
[](https://travis-ci.org/code4craft/xsoup)
XPath selector based on Jsoup.
## Get started:
```java
@Test
public void testSelect() {
String html = "
" +
"";
Document document = Jsoup.parse(html);
String result = Xsoup.compile("//a/@href").evaluate(document).get();
Assert.assertEquals("https://github.com", result);
List list = Xsoup.compile("//tr/td/text()").evaluate(document).list();
Assert.assertEquals("a", list.get(0));
Assert.assertEquals("b", list.get(1));
}
```
## Performance:
Xsoup use Jsoup as HTML parser.
Compare with another most used XPath selector for HTML - [**`HtmlCleaner`**](http://htmlcleaner.sourceforge.net/), Xsoup is much faster:
Normal HTML, size 44KB
XPath: "//a"
Run for 2000 times
Environment:Mac Air MD231CH/A
CPU: 1.8Ghz Intel Core i5
| Operation |
Xsoup |
HtmlCleaner |
| parse |
3,207(ms) |
7,999(ms) |
| select |
95(ms) |
380(ms) |
## Syntax supported:
### XPath1.0:
| Name |
Expression |
Support |
| nodename |
nodename |
yes |
| immediate parent |
/ |
yes |
| parent |
// |
yes |
| attribute |
[@key=value] |
yes |
| nth child |
tag[n] |
yes |
| attribute |
/@key |
yes |
| wildcard in tagname |
/* |
yes |
| wildcard in attribute |
/[@*] |
yes |
| function |
function() |
part |
| or |
a | b |
yes since 0.2.0 |
| parent in path |
. or .. |
no |
| predicates |
price>35 |
no |
| predicates logic |
@class=a or @class=b |
yes since 0.2.0 |
### Function supported:
In Xsoup, we use some function (maybe not in Standard XPath 1.0):
| Expression |
Description |
Standard XPath |
| text(n) |
nth text content of element(0 for all) |
text() only |
| allText() |
text including children |
not support |
| tidyText() |
text including children, well formatted |
not support |
| html() |
innerhtml of element |
not support |
| outerHtml() |
outerHtml of element |
not support |
| regex(@attr,expr,group) |
use regex to extract content |
not support |
### Extended syntax supported:
These XPath syntax are extended only in Xsoup (for convenience in extracting HTML, refer to Jsoup CSS Selector):
| Name |
Expression |
Support |
| attribute value not equals |
[@key!=value] |
yes |
| attribute value start with |
[@key~=value] |
yes |
| attribute value end with |
[@key$=value] |
yes |
| attribute value contains |
[@key*=value] |
yes |
| attribute value match regex |
[@key~=value] |
yes |
## License
MIT License, see file `LICENSE`
[](https://bitdeli.com/free "Bitdeli Badge")