coding……
但行好事 莫问前程

Guava字符串处理工具

作为开发中最常用的数据结构之一String,Apache、Sun等都有提供StringUtils各种工具包。JDK也自带一组String操作方法,极大方便了开发工作,但是诸如join、split使用起来确实不是很方便。基于此Guava提供了四种字符串处理工具连接器(Joiner)、拆分器(Splitter)、匹配器(CharMatcher)、格式器( CaseFormat),本文会分四个章节讲述这四种工具的简单使用。

  • Joiner

S.N. 方法及说明
1 static Joiner on(char separator)
static Joiner on(String separator)
初始化Joiner连接器,separator为Joiner连接器的连接符
2 <A extends Appendable> A appendTo(A appendable, Iterable<?> parts) throws IOException
<A extends Appendable> A appendTo(A appendable, Iterator<?> parts) throws IOException
<A extends Appendable> A appendTo(A appendable, Object[] parts) throws IOException
<A extends Appendable> A appendTo(A appendable, @Nullable Object first, @Nullable Object second, Object… rest) throws IOException
将parts通过连接器的连接符连接成字符串,并拼接到appendable后
3 StringBuilder appendTo(StringBuilder builder, Iterable<?> parts)
StringBuilder appendTo(StringBuilder builder, Iterator<?> parts)
StringBuilder appendTo(StringBuilder builder, @Nullable Object first, @Nullable Object second, Object… rest)
StringBuilder appendTo(StringBuilder builder, Object[] parts)
将parts通过连接器的连接符连接成字符串,并拼接到builder后,返回StringBuilder
4 String join(Iterable<?> parts)
String join(Iterator<?> parts)
String join(@Nullable Object first, @Nullable Object second, Object… rest)
String join(Object[] parts)
将parts通过连接器的连接符连接成字符串
5 Joiner skipNulls()
连接器做join连接操作时跳过null元素
6 Joiner useForNull(final String nullText)
连接器做join连接操作时用nullText替换null元素值
7 Joiner.MapJoiner withKeyValueSeparator(char keyValueSeparator)
Joiner.MapJoiner withKeyValueSeparator(String keyValueSeparator)
初始化一个Map连接器,连接器连接Map对象时,keyValueSeparator为key和value之间的分隔符

示例代码:

public class JoinerTest {
    @Test
    public void joinTest(){
        List<String> list = Lists.newArrayList("aaa", "bbb", null, "ccc");
        String joinStr = Joiner.on("-").skipNulls().join(list);
        assertEquals("aaa-bbb-ccc", joinStr);
    }

    @Test
    public void useForNullTest(){
        List<String> list = Lists.newArrayList("aaa", "bbb", null, "ccc");
        String joinStr = Joiner.on("-").useForNull("null").join(list);
        assertEquals("aaa-bbb-null-ccc", joinStr);
    }

    @Test
    public void appendToTest(){
        List<String> list = Lists.newArrayList("aaa", "bbb", null, "ccc");
        StringBuilder sb = new StringBuilder("this is: ");
        StringBuilder result = Joiner.on("-").skipNulls().appendTo(sb, list);
        assertEquals("this is: aaa-bbb-ccc", result.toString());
    }

    @Test
    public void withKeyValueSeparatorTest(){
        Map<Integer, String> idNameMap = Maps.newHashMap();
        idNameMap.put(1, "Michael");
        idNameMap.put(2, "Mary");
        idNameMap.put(3, "Jane");

        String result = Joiner.on("\n").withKeyValueSeparator(":").join(idNameMap);
        System.out.println(result);
    }
}
  • Splitter

S.N. 方法及说明
1 static Splitter on(char separator)
static Splitter on(final CharMatcher separatorMatcher)
static Splitter on(Pattern separatorPattern)
static Splitter on(final String separator)
static Splitter onPattern(String separatorPattern)
初始化拆分器,参数为分隔符
2 static Splitter fixedLength(final int length)
初始化拆分器,拆分器会将字符串分割为元素长度固定的List,最后一个元素长度不足可以直接返回
3 Splitter omitEmptyStrings()
修饰拆分器,拆分器做拆分操作时,会忽略产生的空元素
4 Splitter trimResults()
修饰拆分器,拆分器做拆分操作时,会对拆分的元素做trim操作(删除元素头和尾的空格)
5 Splitter trimResults(CharMatcher trimmer)
修饰拆分器,拆分器做拆分操作时,会删除元素头尾charMatcher匹配到的字符
6 Iterable<String> split(final CharSequence sequence)
对Stirng通过拆分器进行拆分,返回一个Iterable<String>
7 List<String> splitToList(CharSequence sequence)
对Stirng通过拆分器进行拆分,返回一个List
8 Splitter.MapSplitter withKeyValueSeparator(char separator)
Splitter.MapSplitter withKeyValueSeparator(Splitter keyValueSplitter)
Splitter.MapSplitter withKeyValueSeparator(String separator)
初始化一个Map拆分器,拆分器对String拆分时,separator为key和value之间的分隔符

示例代码:

public class SplitterTest {
    @Test
    public void splitStringToIterableWithDelimiter() {
        /*通过Char初始化拆分器,将String分隔为Iterable*/
        String str = "this, is  , , random , text,";
        List<String> result = Lists.newArrayList(Splitter.on(',').omitEmptyStrings().trimResults().split(str));
        assertThat(result, contains("this", "is", "random", "text"));

        String str1 = "~?~this, is~~ , , random , text,";
        result = Splitter.on(',').omitEmptyStrings().trimResults(CharMatcher.anyOf("~? ")).splitToList(str1);
        System.out.println(result);
        assertThat(result, contains("this", "is", "random", "text"));
    }

    @Test
    public void splitStringToListWithDelimiter() {
        /*通过Char初始化拆分器,将String直接分隔为List*/
        String str = "this, is  , , random , text,";
        List<String> result = Splitter.on(',').omitEmptyStrings().trimResults().splitToList(str);
        assertThat(result, contains("this", "is", "random", "text"));

        /*生成的list不支持add、remove操作*/
        assertThatThrownBy(() -> result.add("haha"))
                .isInstanceOf(UnsupportedOperationException.class)
                .hasNoCause();
    }

    @Test
    public void splitStringToListWithCharMatcher() {
        /*通过CharMatcher初始化拆分器*/
        String str = "a,b;c.d,e.f),g,h.i;j.1,2.3;";

        List<String> result = Splitter.on(CharMatcher.anyOf(";,.)")).omitEmptyStrings().trimResults().splitToList(str);
        assertEquals(13, result.size());
    }

    @Test
    public void splitStringToListWithRegularExpression() {
        /*通过正则表达式初始化拆分器*/
        String str = "apple.banana,,orange,,.";

        List<String> result = Splitter.onPattern("[.|,]").omitEmptyStrings().trimResults().splitToList(str);
        assertEquals(3, result.size());
    }

    @Test
    public void splitStringToListWithFixedLength() {
        /*将字符串分割为元素长度固定的List,最后一个元素长度不足可以直接返回*/
        String str = "Hello world";
        List<String> result = Splitter.fixedLength(3).splitToList(str);

        assertThat(result, contains("Hel", "lo", "wor", "ld"));
    }

    @Test
    public void splitStringToMap() {
        /*String转Map*/
        String str = "John=first,Adam=second";
        Map<String, String> result = Splitter.on(",")
                .withKeyValueSeparator("=")
                .split(str);

        assertEquals("first", result.get("John"));
        assertEquals("second", result.get("Adam"));
    }
}
  • CharMatcher

CharMatcher是Guava自定义的匹配器,可以理解为一个CharMatcher实例代表一类字符,可以用于匹配CharSequence中的字符以及对匹配的字符做特定的操作,如修剪[trim]、折叠[collapse]、移除[remove]、保留[retain]等。现在Guava已更新到Guava 25,有很多方法及静态成员变量都已过期。首先罗列一下Guava中已过期不建议使用的方法以及替代方案:

过期静态成员变量 对应的过期静态方法 可行方案
CharMatcher.ANY CharMatcher.any()
CharMatcher.ASCII CharMatcher.ascii()
CharMatcher.BREAKING_WHITESPACE CharMatcher.breakingWhitespace()
CharMatcher.DIGIT CharMatcher.digit() CharMatcher.forPredicate(Character::isDigit)
CharMatcher.INVISIBLE CharMatcher.invisible()
CharMatcher.JAVA_DIGIT CharMatcher.javaDigit() CharMatcher.forPredicate(Character::isDigit)
CharMatcher.JAVA_ISO_CONTROL CharMatcher.javaIsoControl()
CharMatcher.JAVA_LETTER CharMatcher.javaLetter() CharMatcher.forPredicate(Character::isLetter)
CharMatcher.JAVA_LETTER_OR_DIGIT CharMatcher.javaLetterOrDigit() CharMatcher.forPredicate(Character::isLetterOrDigit)
CharMatcher.JAVA_LOWER_CASE CharMatcher.javaLowerCase() CharMatcher.forPredicate(Character::isLowerCase)
CharMatcher.JAVA_UPPER_CASE CharMatcher.javaUpperCase() CharMatcher.forPredicate(Character::isUpperCase)
CharMatcher.NONE CharMatcher.none()
CharMatcher.SINGLE_WIDTH CharMatcher.singleWidth()
CharMatcher.WHITESPACE CharMatcher.whitespace()

常用方法说明:

S.N. 方法及说明
1 static CharMatcher any()
获取可以匹配所有字符的匹配器
2 static CharMatcher anyOf(CharSequence sequence)
通过sequence初始化匹配器,该匹配器可以匹配sequence中所有字符
3 static CharMatcher ascii()
获取可以匹配所有ascii码的匹配器
4 static CharMatcher breakingWhitespace()
获取可以匹配所有可换行的空白字符的匹配器(不包括非换行空白字符,例如”\u00a0″)
5 static CharMatcher forPredicate(Predicate<? super Character> predicate)
通过Predicate初始化CharMatcher,该匹配器可以匹配Predicate函数式接口apply方法实现返回True的字符
6 static CharMatcher inRange(char startInclusive, char endInclusive)
通过边界值初始化CharMatcher,该匹配器可以匹配处于startInclusive和endInclusive之间的所有字符
7 static CharMatcher is(char match)
通过单个字符初始化CharMatcher,该匹配器只能匹配match这个单字符
8 static CharMatcher isNot(char match)
通过单个字符初始化CharMatcher,该匹配器可以匹配除了match之外的所有字符
9 static CharMatcher javaIsoControl()
获取可以匹配所有Java转义字符的匹配器
10 static CharMatcher none()
获取不匹配任意字符的匹配器,与any()相反
11 static CharMatcher noneOf(CharSequence sequence)
通过sequence初始化匹配器,该匹配器可以匹配除sequence之外的所有字符
12 static CharMatcher whitespace()
获取可以匹配所有空格的匹配器
13 CharMatcher and(CharMatcher other)
修饰匹配器,返回当前匹配器与other匹配器做与操作的匹配器
14 CharMatcher negate()
修饰匹配器,返回和当前匹配器相反的匹配器
15 CharMatcher or(CharMatcher other)
修饰匹配器,返回当前匹配器与other匹配器做或操作的匹配器
16 CharMatcher precomputed()
修饰匹配器,返回的CharMatcher在检索时比原始的CharMatcher效率高,但是预处理也需要花时间,所以只有当某个 CharMatcher需要被使用上千次的时候才有必要进行预处理
17 String collapseFrom(CharSequence sequence, char replacement)
折叠操作,将charMatcher连续被匹配到的字符用一个replacement替换
18 int countIn(CharSequence sequence)
获取charMatcher在sequence中匹配到字符的个数
19 int indexIn(CharSequence sequence)
获取charMatcher在当sequence中匹配到的第一个字符的index
int indexIn(CharSequence sequence, int start)
获取charMatcher在当sequence中从index start开始匹配到的第一个字符的index
20 int lastIndexIn(CharSequence sequence)
获取获取charMatcher在当sequence中匹配到的最后一个字符的index
21 boolean matchesAllOf(CharSequence sequence)
判断sequence所有字符是否都被charMatcher匹配
22 boolean matchesAnyOf(CharSequence sequence)
判断sequence中是否存在字符被charMatcher匹配
23 boolean matchesNoneOf(CharSequence sequence)
判断sequence所有字符是否都没被charMatcher匹配
24 String removeFrom(CharSequence sequence)
删除sequence中所有被charMatcher匹配到的字符
25 String replaceFrom(CharSequence sequence, char replacement)
String replaceFrom(CharSequence sequence, CharSequence replacement)
将sequence中所有被charMatcher匹配到的字符用replacement替换
26 String retainFrom(CharSequence sequence)
保留sequence中所有被charMatcher匹配到的字符
27 String trimAndCollapseFrom(CharSequence sequence, char replacement)
先对sequence做trim操作(删除sequence头和尾的空格),再对trim的结果做collapse操作(将charMatcher连续被匹配到的字符用一个replacement替换)
28 String trimFrom(CharSequence sequence)
删除sequence首尾charMatcher匹配到的字符
String trimLeadingFrom(CharSequence sequence)
删除sequence首部charMatcher匹配到的字符
String trimTrailingFrom(CharSequence sequence)
删除sequence尾部charMatcher匹配到的字符

示例代码:

public class CharMatcherTest {
    @Test
    public void retainFromTest() {
        String input = "H*el.lo,}12";

        /*以下方法和静态成员变量都已过期,不建议使用
        CharMatcher matcher = CharMatcher.JAVA_LETTER_OR_DIGIT;
        matcher = CharMatcher.javaLetterOrDigit();*/

        /*使用如下初始化*/
        /*CharMatcher matcher = new CharMatcher() {
            @Override
            public boolean matches(char c) {
                return Character.isLetterOrDigit(c);
            }
        };*/

        /*matcher = CharMatcher.forPredicate(Predicates.compose(Predicates.containsPattern("\\w"), Functions.toStringFunction()));*/

        /*Predicate<Character> isLetterOrDigit = new Predicate<Character>() {
            @Override
            public boolean apply(@Nullable Character character) {
                return Character.isLetterOrDigit(character);
            }
        };
        matcher = CharMatcher.forPredicate(isLetterOrDigit);*/

        CharMatcher matcher = CharMatcher.forPredicate(Character::isLetterOrDigit);
        String result = matcher.retainFrom(input);

        assertEquals("Hello12", result);
    }

    @Test
    public void andTest() {
        /*返回两个Matcher执行逻辑与操作的Matcher*/
        String input = "H*el.lo,}12";

        CharMatcher matcher0 = CharMatcher.forPredicate(Character::isLetter);
        CharMatcher matcher1 = CharMatcher.forPredicate(Character::isLowerCase);

        String result = matcher0.and(matcher1).retainFrom(input);
        assertEquals("ello", result);
    }

    @Test
    public void anyTest() {
        /*匹配任意字符*/
        String input = "H*el.lo,}12";
        CharMatcher matcher = CharMatcher.any();

        String result = matcher.retainFrom(input);
        assertEquals("H*el.lo,}12", result);
    }

    @Test
    public void anyOfTest() {
        /*匹配在CharSequence内的任意一个字符*/
        String input = "H*el.lo,}12";
        CharMatcher matcher = CharMatcher.anyOf("Hel");

        String result = matcher.removeFrom(input);
        assertEquals("*.o,}12", result);
    }

    @Test
    public void asciiTest() {
        /*匹配Ascii*/
        String input = "あH*el.lo,}12";
        CharMatcher matcher = CharMatcher.ascii();

        String result = matcher.retainFrom(input);
        assertEquals("H*el.lo,}12", result);
    }

    @Test
    public void breakingWhitespaceTest() {
        /*匹配所有可换行的空白字符,(不包括非换行空白字符,例如"\u00a0")*/
        String input = " this is test ";
        CharMatcher matcher = CharMatcher.breakingWhitespace();

        String result = matcher.removeFrom(input);
        assertEquals("thisistest", result);
    }

    @Test
    public void collapseTest() {
        /*将charMatcher连续被匹配到的字符用一个replacement替换*/
        String input = "       hel    lo      ";

        String result = CharMatcher.is(' ').collapseFrom(input, '-');
        assertEquals("-hel-lo-", result);

        /*先进性Trim操作(讲charSequence头和尾匹配到的连续字符去除),再进行collapseFrom操作*/
        result = CharMatcher.is(' ').trimAndCollapseFrom(input, '-');
        assertEquals("hel-lo", result);
    }

    @Test
    public void countInTest() {
        /*获取charMatcher匹配到字符的个数*/
        String input = "H*el.lo,}12";
        CharMatcher matcher = CharMatcher.forPredicate(Character::isLetterOrDigit);
        int count = matcher.countIn(input);
        assertEquals(7, count);
    }

    @Test
    public void forPredicateTest() {
        /*通过predicate初始化charMatcher*/
        CharMatcher matcher = CharMatcher.forPredicate(Character::isLetterOrDigit);

        Predicate<Character> isLetterOrDigit = new Predicate<Character>() {
            @Override
            public boolean apply(@Nullable Character character) {
                return Character.isLetterOrDigit(character);
            }
        };
        CharMatcher matcher1 = CharMatcher.forPredicate(isLetterOrDigit);
    }

    @Test
    public void indexInTest() {
        /*获取charMatcher匹配到第一个字符的index*/
        String input = "**el.lo,}12";
        CharMatcher matcher = CharMatcher.forPredicate(Character::isLetterOrDigit);
        int index = matcher.indexIn(input);
        assertEquals(2, index);

        index = matcher.indexIn(input, 4);
        assertEquals(5, index);
    }

    @Test
    public void inRangeTest() {
        /*初始化范围匹配器*/
        String input = "a, c, z, 1, 2";

        int result = CharMatcher.inRange('a', 'h').countIn(input);
        assertEquals(2, result);

    }

    @Test
    public void isTest(){
        /*通过char初始化charMatcher,匹配单个字符*/
        String input = "a, c, z, 1, 2";
        int result = CharMatcher.is(',').countIn(input);
        assertEquals(4, result);
    }

    @Test
    public void isNotTest(){
        /*匹配参数之外的所有字符,与is相反*/
        String input = "a, c, z, 1, 2";
        String result = CharMatcher.isNot(',').removeFrom(input);
        assertEquals(",,,,", result);
    }

    @Test
    public void javaIsoControlTest(){
        /*匹配java转义字符*/
        String input = "ab\tcd\nef\bg";
        String result = CharMatcher.javaIsoControl().removeFrom(input);
        assertEquals("abcdefg", result);
    }

    @Test
    public void lastIndexInTest(){
        /*获取charMatcher匹配到最后一个字符的index*/
        String input = "**e,l.lo,}12";
        CharMatcher matcher = CharMatcher.is(',');
        int index = matcher.lastIndexIn(input);
        assertEquals(8, index);
    }

    @Test
    public void matchesAllOfTest(){
        /*判断CharSequence每一个字符是不是都已被charMatcher匹配*/
        String input = "**e,l.lo,}12";
        CharMatcher matcher = CharMatcher.is(',');
        assertFalse(matcher.matchesAllOf(input));
    }

    @Test
    public void matchesAnyOfTest(){
        /*判断CharSequence是否存在字符被charMatcher匹配*/
        String input = "**e,l.lo,}12";
        CharMatcher matcher = CharMatcher.is(',');
        assertTrue(matcher.matchesAnyOf(input));
    }

    @Test
    public void matchesNoneOfTest(){
        /*判断CharSequence是否每一个字符都没有被charMatcher匹配*/
        String input = "**e,l.lo,}12";
        CharMatcher matcher = CharMatcher.is('?');
        assertTrue(matcher.matchesNoneOf(input));
    }

    @Test
    public void negateTest(){
        /*返回与当前CharMatcher相反的CharMatcher*/
        String input = "あH*el.lo,}12";
        /*charMatcher为非ascii*/
        CharMatcher matcher = CharMatcher.ascii().negate();

        String result = matcher.retainFrom(input);
        assertEquals("あ", result);
    }

    @Test
    public void noneTest(){
        /*不匹配任何字符,与any()相反*/
        String input = "H*el.lo,}12";
        CharMatcher matcher = CharMatcher.none();

        String result = matcher.retainFrom(input);
        assertEquals("", result);
    }

    @Test
    public void noneOfTest(){
        /*不匹配CharSequence内的任意一个字符,与anyOf()相反*/
        String input = "H*el.lo,}12";
        CharMatcher matcher = CharMatcher.noneOf("Hel");

        String result = matcher.removeFrom(input);
        assertEquals("Hell", result);
    }

    @Test
    public void orTest(){
        /*返回两个Matcher执行逻辑或操作的Matcher*/
        String input = "H*el.lo,}12";

        CharMatcher matcher0 = CharMatcher.forPredicate(Character::isLetter);
        CharMatcher matcher1 = CharMatcher.forPredicate(Character::isDigit);

        String result = matcher0.or(matcher1).retainFrom(input);
        assertEquals("Hello12", result);
    }

    @Test
    public void trimFromTest(){
        String input = "---hello,,,";

        /*删除首部匹配到的字符*/
        String result = CharMatcher.is('-').trimLeadingFrom(input);
        assertEquals("hello,,,", result);

        /*删除尾部匹配到的字符*/
        result = CharMatcher.is(',').trimTrailingFrom(input);
        assertEquals("---hello", result);

        /*删除首尾匹配到的字符*/
        result = CharMatcher.anyOf("-,").trimFrom(input);
        assertEquals("hello", result);
    }

    @Test
    public void whitespaceTest(){
        /*匹配所有空白字符*/
        String input = "       hel    lo      ";

        String result = CharMatcher.whitespace().collapseFrom(input, '-');
        assertEquals("-hel-lo-", result);

    }
}
  • CaseFormat

CaseFormat格式器,提供不同的ASCII字符格式之间的转换。CaseFormat支持的格式如下:

格式 范例
LOWER_CAMEL lowerCamel
LOWER_HYPHEN lower-hyphen
LOWER_UNDERSCORE lower_underscore
UPPER_CAMEL UpperCamel
UPPER_UNDERSCORE UPPER_UNDERSCORE

常用方法:

S.N. 方法及说明
1 Converter<String,String> converterTo(CaseFormat targetFormat)
返回一个Converter转换器,该转换器会将String按照源格式器转化为targetFormat格式
2 String to(CaseFormat format, String str)
将str按照源caseFormat格式转化为目标format格式

示例代码:

public class CaseFormatTest {
    @Test
    public void converterToTest(){
        /*返回一个Converter转换器,该转换器会将String按照源格式器转化为targetFormat格式*/
        Converter<String, String> camelConverter = CaseFormat.LOWER_CAMEL.converterTo(CaseFormat.UPPER_UNDERSCORE);
        String input = "input_camel";
        String result = camelConverter.convert(input);
        assertEquals("INPUT_CAMEL", result);
    }

    @Test
    public void toTest(){
        /*将str按照源caseFormat格式转化为目标format格式*/
        String result = CaseFormat.LOWER_HYPHEN.to(CaseFormat.LOWER_CAMEL,"foo-bar");
        assertEquals("fooBar", result);
    }
}

赞(0) 打赏
Zhuoli's Blog » Guava字符串处理工具
分享到: 更多 (0)

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址