Java
The unobvious difference between split method in String class and the default split method in StringUtils
Programmers who have dabbled excessively with String manipulation in Java and have used the split method offered by Java's String class and the one offered by StringUtils class from Apache Commons library inter-changeably must have come across the subtle difference mentioned below some time.
The difference lies in how the two split methods consider an empty token before the first occurrence of the delimiter. The 'split' method in String class includes the empty token while the one offered by StringUtils does not.
E.g.
",1,2,3,".split(",") - returns the array [<>, <1>, <2>, <3>] (the '<>' used intentionally to illustrate the empty token). Trailing tokens are anyway discarded by even this split method (refer Javadoc)
StringUtils.split(",1,2,3,") - returns the array [<1>, <2>, <3>] (note the empty token is not returned here).
The difference lies in the way 'split' in String class works. It internally uses Matcher and Pattern and the one line comment below in Matcher explains the reason:
P.S.: This class however has the 'splitPreserveAllTokens' method and its overloaded variants that retain even trailing tokens (which is the main difference between this set of methods and the 'split' in String class).
So for e.g.
StringUtils.splitPreserveAllTokens(",1,2,3,") - returns the array [<>,<1>, <2>, <3>,<>]
PS: Please leave your comment on this post (as long as it is not abusive ;)), as it will help me understand the usefulness.
The difference lies in how the two split methods consider an empty token before the first occurrence of the delimiter. The 'split' method in String class includes the empty token while the one offered by StringUtils does not.
E.g.
",1,2,3,".split(",") - returns the array [<>, <1>, <2>, <3>] (the '<>' used intentionally to illustrate the empty token). Trailing tokens are anyway discarded by even this split method (refer Javadoc)
StringUtils.split(",1,2,3,") - returns the array [<1>, <2>, <3>] (note the empty token is not returned here).
The difference lies in the way 'split' in String class works. It internally uses Matcher and Pattern and the one line comment below in Matcher explains the reason:
public String[] split(CharSequence input, int limit) {The StringUtils class also has a mechanism to preserve all tokens but the default split method passes false to this parameter and hence the difference.
...
...
Matcher m = matcher(input);
// Add segments before each match found
while(m.find()) {
...
...
}
P.S.: This class however has the 'splitPreserveAllTokens' method and its overloaded variants that retain even trailing tokens (which is the main difference between this set of methods and the 'split' in String class).
So for e.g.
StringUtils.splitPreserveAllTokens(",1,2,3,") - returns the array [<>,<1>, <2>, <3>,<>]
PS: Please leave your comment on this post (as long as it is not abusive ;)), as it will help me understand the usefulness.
0 comments:
Post a Comment