stream流太难用了看看JDFrame

由于经常记不住stream的一些API每次要复制来复制去并且又长又臭,想要更加语意化的api,于是想到了以前写大数据Spark pandnas 等DataFrame模型时的API, 然后发现其实也存在java的JVM层的DataFrame模型比如 tablesaw,joinery

但是他们得硬编码去指定字段名,这对于有代码洁癖的人实在难以忍受,而且我只是简单统计下数据,我想在一些场景下能不能使用匿名函数去指定的字段处理去处理,于是便有了这个

一个jvm层级的仿DataFrame工具,语意化和简化java8的stream流式处理工具

1.1、引入依赖

<dependency>
    <groupId>io.github.burukeyou</groupId>
    <artifactId>jdframe</artifactId>
    <version>0.0.4</version>
</dependency>

1.2、案例

统计每个学校的里学生年龄不为空并且年龄在9到16岁间的合计分数,并且获取合计分前2名的学校

    static List<Student> studentList = new ArrayList<>();

    static {
        studentList.add(new Student(1,"a","一中","一年级",11, new BigDecimal(1)));
        studentList.add(new Student(2,"a","一中","一年级",11, new BigDecimal(1)));
        studentList.add(new Student(3,"b","一中","三年级",12, new BigDecimal(2)));
        studentList.add(new Student(4,"c","二中","一年级",13, new BigDecimal(3)));
        studentList.add(new Student(5,"d","二中","一年级",14, new BigDecimal(4)));
        studentList.add(new Student(6,"e","三中","二年级",14, new BigDecimal(5)));
        studentList.add(new Student(7,"e","三中","二年级",15, new BigDecimal(5)));
    }








SDFrame<FI2<String, BigDecimal>> sdf2 = SDFrame.read(studentList)
        .whereNotNull(Student::getAge)
        .whereBetween(Student::getAge,9,16)
        .groupBySum(Student::getSchool, Student::getScore)
        .sortDesc(FI2::getC2)
        .cutFirst(2);

sdf2.show();

输出信息;

c1           c2    
三中    10    
二中    7     
@Data
@AllArgsConstructor
@NoArgsConstructor
public class Student {

    private int id;
    private String name;
    private String school;
    private String level;
    private Integer age;
    private BigDecimal score;

    private Integer rank;

    public Student(String level, BigDecimal score) {
        this.level = level;
        this.score = score;
    }

    public Student(int id, String name, String school, String level, Integer age, BigDecimal score) {
        this.id = id;
        this.name = name;
        this.school = school;
        this.level = level;
        this.age = age;
        this.score = score;
    }
}

2.1、矩阵查看相关

        void show(int n); 
        List<String> columns();   
        List<R> col(Function<T, R> function);   
        T head();                   
        List<T> head(int n);          
        T tail();                       
        List<T> tail(int n);            
        List<T> page(int page,int pageSize) 

2.2、筛选相关

SDFrame.read(studentList)
        .whereBetween(Student::getAge,3,6) 
        .whereBetweenR(Student::getAge,3,6) 
        .whereBetweenL(Student::getAge,3,6)      
        .whereNotNull(Student::getName) 
        .whereGt(Student::getAge,3)    
        .whereGe(Student::getAge,3)   
        .whereLt(Student::getAge,3)  
        .whereIn(Student::getAge, Arrays.asList(3,7,8)) 
        .whereNotIn(Student::getAge, Arrays.asList(3,7,8)) 
        .whereEq(Student::getAge,3) 
        .whereNotEq(Student::getAge,3) 
        .whereLike(Student::getName,"jay") 
        .whereLikeLeft(Student::getName,"jay") 
        .whereLikeRight(Student::getName,"jay"); 

2.3、汇总相关

JDFrame<Student> frame = JDFrame.read(studentList);
Student s1 = frame.max(Student::getAge);
Integer s2  = frame.maxValue(Student::getAge);      
Student s3 = frame.min(Student::getAge);
Integer s4  = frame.minValue(Student::getAge);      
BigDecimal s5 = frame.avg(Student::getAge); 
BigDecimal s6 = frame.sum(Student::getAge); 
MaxMin<Student> s7 = frame.maxMin(Student::getAge); 
MaxMin<Integer> s8 = frame.maxMinValue(Student::getAge); 

2.4、去重相关

原生steam只支持对象去重,不支持按特定字段去重

List<Student> std = null;
std = SDFrame.read(studentList).distinct().toLists(); 
std = SDFrame.read(studentList).distinct(Student::getSchool).toLists(); 
std = SDFrame.read(studentList).distinct(e -> e.getSchool() + e.getLevel()).toLists(); 
std =SDFrame.read(studentList).distinct(Student::getSchool).distinct(Student::getLevel).toLists(); 

2.5、分组聚合相关

类似sql的 group by语义 简化处理分组和聚合的逻辑, 如果用原生stream需要写可能一大串逻辑.

JDFrame<Student> frame = JDFrame.from(studentList);

List<FI2<String, BigDecimal>> a = frame.groupBySum(Student::getSchool, Student::getAge).toLists();

List<FI2<String, Integer>> a2 = frame.groupByMaxValue(Student::getSchool, Student::getAge).toLists();

List<FI2<String, Student>> a3 = frame.groupByMax(Student::getSchool, Student::getAge).toLists();

List<FI2<String, Integer>> a4 = frame.groupByMinValue(Student::getSchool, Student::getAge).toLists();

List<FI2<String, Long>> a5 = frame.groupByCount(Student::getSchool).toLists();

List<FI2<String, BigDecimal>> a6 = frame.groupByAvg(Student::getSchool, Student::getAge).toLists();


List<FI3<String, BigDecimal, Long>> a7 = frame.groupBySumCount(Student::getSchool, Student::getAge).toLists();


List<FI3<String, String, BigDecimal>> a8 = frame.groupBySum(Student::getSchool, Student::getLevel, Student::getAge).toLists();


List<FI4<String, String, String, BigDecimal>> a9 = frame.groupBySum(Student::getSchool, Student::getLevel, Student::getName, Student::getAge).toLists();

2.6、排序相关

简化原生stream的排序方式,直接指定字段即可,不用使用Comparator还要去关注升序还是降序

        SDFrame.read(studentList).sortDesc(Student::getAge);

        SDFrame.read(studentList).sortDesc(Student::getAge).sortAsc(Student::getLevel);

        SDFrame.read(studentList).sortAsc(Student::getAge);

        SDFrame.read(studentList).sortAsc(Comparator.comparing(e -> e.getLevel() + e.getId()));

2.7、连接矩阵相关

API列表

        append(T t);                    
        union(IFrame<T> other);         
        join(IFrame<K> other, JoinOn<T,K> on, Join<T,K,R> join);   
        leftJoin(IFrame<K> other, JoinOn<T,K> on, Join<T,K,R> join);   
        rightJoin(IFrame<K> other, JoinOn<T,K> on, Join<T,K,R> join);    

内连接例子:

        System.out.println("======== 矩阵1 =======");

        SDFrame<Student> sdf = SDFrame.read(studentList);

        sdf.show(20);


        SDFrame<FI2<String, BigDecimal>> sdf2 = SDFrame.read(studentList)
                .whereNotNull(Student::getAge)
                .whereBetween(Student::getAge,9,16)
                .groupBySum(Student::getSchool, Student::getScore)
                .sortDesc(FI2::getC2)
                .cutFirst(10);

        System.out.println("======== 矩阵2 =======");
        sdf2.show();

        SDFrame<UserInfo> frame = sdf.join(sdf2, (a, b) -> a.getSchool().equals(b.getC1()), (a, b) -> {
            UserInfo userInfo = new UserInfo();
            userInfo.setKey1(a.getSchool());
            userInfo.setKey2(b.getC2().intValue());
            userInfo.setKey3(String.valueOf(a.getId()));
            return userInfo;
        });

        System.out.println("======== 连接后结果 =======");
        frame.show(5);

打印信息:

======== 矩阵1 =======
id    name    school    level    age    score    rank    
1     a       一中        一年级      11     1                
2     a       一中        一年级      11     1                
3     b       一中        一年级      12     2                
4     c       二中        一年级      13     3                
5     d       二中        一年级      14     4                
6     e       三中        二年级      14     5                
7     e       三中        二年级      15     5                

======== 矩阵2 =======
c1    c2    
三中    10    
二中    7     
一中    4     

======== 连接后结果 =======
key1    key2    key3    key4    
一中      4       1               
一中      4       2               
一中      4       3               
二中      7       4               
二中      7       5   

类似于

select a.*,b.* from sdf a inner join sdf2 b on  a.school = b.c1

2.8、截取相关

    cutFirst(int n); 
    cutLast(int n); 
    cut(Integer startIndex,Integer endIndex) 
    cutPage(int page,int pageSize)      
    cutFirstRank(Sorter<T> sorter, int n);    

2.9、Frame参数设置相关

defaultScale(int scale, RoundingMode roundingMode); 

2.10、其他

百分数转换

SDFrame<Student> map2 = SDFrame.read(studentList).mapPercent(Student::getScore, Student::setScore,2);

分区

将每个5个元素分成一个小集合,用于将大任务拆成小任务

List<List<Student>> t = SDFrame.read(studentList).partition(5).toLists();

生成序号列

按照age排序,然后根据当前顺序生成排序号到rank字段 (序号从1开始)

SDFrame.read(studentList)
    .sortDesc(Student::getAge)
    .addRowNumberCol(Student::setRank)
    .show(30);

输出信息:

id    name    school    level    age    score    rank    
7     e       三中        二年级      15     5        1       
5     d       二中        一年级      14     4        2       
6     e       三中        二年级      14     5        3       
4     c       二中        一年级      13     3        4       
3     b       一中        三年级      12     2        5       
1     a       一中        一年级      11     1        6       
2     a       一中        一年级      11     1        7   

补充条目

1、补充缺失的学校条目

List<String> allDim = Arrays.asList("一中","二中","三中","四中");

SDFrame.read(studentList).replenish(Student::getSchool,allDim,(school) -> new Student(school)).show();

输出

id    name    school    level    age    score    rank    
1     a       一中        一年级      11     1                
2     a       一中        一年级      11     1                
3     b       一中        一年级      12     2                
4     c       二中        一年级      13     3                
5     d       二中        一年级      14     4                
6     e       三中        二年级      14     5                
7     e       三中        二年级      15     5                
0             四中  

2、分组补充组内缺失的条目

按照学校进行分组, 汇总所有年级allDim. 然后与allDim比较补充每个分组内缺失的年级,缺失的年级按照ReplenishFunction生成补充条目

SDFrame.read(studentList).replenish(Student::getSchool,Student::getLevel,(school,level) -> new Student(school,level)).show(30);

输出

id    name    school    level    age    score    rank    
1     a       一中        一年级      11     1                
2     a       一中        一年级      11     1                
3     b       一中        三年级      12     2                
0             一中        二年级                              
4     c       二中        一年级      13     3                
5     d       二中        一年级      14     4                
0             二中        三年级                              
0             二中        二年级                              
6     e       三中        二年级      14     5                
7     e       三中        二年级      15     5                
0             三中        一年级                              
0             三中        三年级 
请登录后发表评论

    没有回复内容