SpringDataJPA快速读写复杂对象的技巧

哥们看看码农 2024-08-05 19:13:04
使用 spring-data-jpa 与 JPA 一起工作是提高您作为 开发人员生产力的绝佳方式。无论您的对象有多复杂,只要它被正确建模,它就会被持久化,您可以方便的进行数据查询和操作。 但是有一个缺点是读取和写入这些对象可能会变得非常慢。我们在之前的一篇文章中讨论了如何使用 JPA 进行快速写入,其中我们讨论了批量插入,但我们没有过多关注我们的 JPA 实现的要求。 欢迎SpringForAll社区(spring4all.com),专注分享关于Spring的一切!关注公众号:SpringForAll社区,回复“加群”还可加入Spring技术交流群! 复杂对象会带来的问题在这篇文章中,我们将处理一个复杂的 CustomerOrder 对象,该对象有一个子集合,而这个子集合又有自己的子集合。 @Entity@Table(name = "customer_order")public CustomerOrderEntity { @Id @GeneratedValue(strategy = GenerationType.IDENTITY) private Long id; private String customerReference; private String customerName; private String customerEmail; private LocalDateTime orderTimestamp; @OneToMany(mappedBy = "customerOrder", cascade = CascadeType.ALL, fetch = FetchType.EAGER, orphanRemoval = true) private Set lines = new HashSet<>();}@Entity@Table(name = "customer_order_line")public CustomerOrderLineEntity { @Id @GeneratedValue(strategy = GenerationType.IDENTITY) private Long id; @ManyToOne(cascade = CascadeType.DETACH) @JoinColumn(name = "customer_order_id") private CustomerOrderEntity customerOrder; private Integer lineNumber; private String articleId; private BigDecimal quantity; private BigDecimal unitPrice; private BigDecimal deliveredQuantity = BigDecimal.ZERO; @OneToMany(mappedBy = "customerOrderLine", cascade = CascadeType.ALL, fetch = FetchType.LAZY, orphanRemoval = true) private Set attributes = new HashSet<>(); @Entity @Table(name = "customer_order_line_attribute") @IdClass(COLineAttributeEntity.COLineAttributeEntityId.class) public static COLineAttributeEntity implements Persistable { @Id @ManyToOne(cascade = CascadeType.DETACH) @JoinColumn(name = "customer_order_line_id") private CustomerOrderLineEntity customerOrderLine; @Id @Column(name = "attr_key") private String key; @Column(name = "attr_value") private String value; private static COLineAttributeEntityId implements Serializable { private static final long serialVersionUID = -1875048762324910075L; private Long customerOrderLine; private String key; } }}CustomerOrder 有一个子集合 CustomerOrderLine,每条订单行可以有任意数量的 COLineAttribute 键值对。CustomerOrder 对象和 CustomerOrderLine 都使用 Identity 作为生成唯一 ID 的策略。 假设我们有一个包含 50 条订单的订单,每条订单行有三个属性,总共需要写入数据库的条目数为 1 + 50 + 50 * 3 = 201 条。配置了批量插入后,我们得到以下结果 2024-04-03T15:25:45.111+02:00 INFO 3178207 --- [employee-service-v1] [nio-8080-exec-1] i.StatisticalLoggingSessionEventListener : Session Metrics { 234768043 nanoseconds spent acquiring 1 JDBC connections; 0 nanoseconds spent releasing 0 JDBC connections; 38397976 nanoseconds spent preparing 101 JDBC statements; 256144869 nanoseconds spent executing 51 JDBC statements; 559908072 nanoseconds spent executing 50 JDBC batches; 0 nanoseconds spent performing 0 L2C puts; 0 nanoseconds spent performing 0 L2C hits; 0 nanoseconds spent performing 0 L2C misses; 35958936 nanoseconds spent executing 2 flushes (flushing a total of 402 entities and 104 collections); 0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)}我们没有看到批量插入的原因是我们为每个 CustomerOrderLine 使用 Identity 作为生成唯一 ID 的策略。这迫使 Hibernate 将每一行作为一个单独的语句发送到数据库,从而禁用了批量插入。 使用 Sequence 作为生成策略为了克服这个问题,我们可以切换到使用序列作为策略,因为这允许我们在应用程序的内存中计算序列的下一个值,而不是通过数据库。 @Entity@Table(name = "customer_order_line")public CustomerOrderLineEntity { @Id @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "customer_order_line_id_seq") @SequenceGenerator(name = "customer_order_line_id_seq", sequenceName = "col_id_seq", allocationSize = 40) private Long id;}通过这种配置,我们得到以下输出 Hibernate: insert into customer_order (customer_email, customer_name, customer_reference, order_timestamp) values (?, ?, ?, ?)Hibernate: select next_val as id_val from col_id_seq for updateHibernate: update col_id_seq set next_val= ? where next_val=?Hibernate: select next_val as id_val from col_id_seq for update...Hibernate: insert into customer_order_line (article_id, customer_order_id, delivered_quantity, line_number, quantity, unit_price, id) values (?, ?, ?, ?, ?, ?, ?)...2024-04-03T15:32:42.210+02:00 INFO 3200215 --- [employee-service-v1] [nio-8080-exec-1] i.StatisticalLoggingSessionEventListener : Session Metrics { 259294665 nanoseconds spent acquiring 3 JDBC connections; 205008 nanoseconds spent releasing 2 JDBC connections; 4817597 nanoseconds spent preparing 7 JDBC statements; 22735084 nanoseconds spent executing 5 JDBC statements; 32578453 nanoseconds spent executing 2 JDBC batches; 0 nanoseconds spent performing 0 L2C puts; 0 nanoseconds spent performing 0 L2C hits; 0 nanoseconds spent performing 0 L2C misses; 68606595 nanoseconds spent executing 2 flushes (flushing a total of 402 entities and 104 collections); 0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)}首先,由于这是基于 MySQL,而 MySQL 不原生支持序列,因此它会从表示序列的表 col_id_seq 中查询 next_val。当它执行此操作时,还会锁定该行,以防止其他实例在更新值之前访问。在这个例子中,我将 allocationSize 设置为 40,这意味着对于 50 行,它需要执行两次分配。 如果随后放置一个新订单,该订单的行数少于 30 行(2 * 40 – 50 = 30),则可以在不分配新值的情况下插入这些行。 其次,它可以利用 JDBC 批量插入,这在插入大量数据时效率更高。插入相同数量的数据时间从 1.2 秒下降到 0.38 秒。 使用 UUID 作为生成策略另一种替代使用序列的方法是使用随机 UUID,因为这些也可以在应用程序的内存中生成,并且 JPA 原生支持它。 @Entity@Table(name = "customer_order_line")public CustomerOrderLineEntity { @Id @GeneratedValue(strategy = GenerationType.UUID) private String id;}由于这些是随机生成的,它们在插入时会给数据库带来问题。随着数据库的增大,插入 50 行时,其 ID 值将均匀分布在整个索引值空间中。这迫使数据库读取、写入并可能重新排列索引空间中的多个不同页面。当这种情况发生时,您会很快看到数据库上有大量的 I/O,性能将严重受限。 通过利用有序的序列,我们只处理索引空间中最新的页面,这意味着我们的数据库不必不断地交换页面进出。 读取 CustomerOrder如果我们现在想从数据库中读取一个 CustomerOrder 对象,我们会得到以下结果: Hibernate: select coe1_0.id, coe1_0.customer_email, coe1_0.customer_name, coe1_0.customer_reference, l1_0.customer_order_id, l1_0.id, l1_0.article_id, l1_0.delivered_quantity, l1_0.line_number, l1_0.quantity, l1_0.unit_price, coe1_0.order_timestamp from customer_order coe1_0 left join customer_order_line l1_0 on coe1_0.id=l1_0.customer_order_id where coe1_0.id=?Hibernate: select a1_0.customer_order_line_id, a1_0.attr_key, a1_0.attr_value from customer_order_line_attribute a1_0 where a1_0.customer_order_line_id=?...2024-04-03T15:55:00.969+02:00 INFO 3270276 --- [employee-service-v1] [nio-8080-exec-1] i.StatisticalLoggingSessionEventListener : Session Metrics { 289787297 nanoseconds spent acquiring 1 JDBC connections; 0 nanoseconds spent releasing 0 JDBC connections; 11803018 nanoseconds spent preparing 51 JDBC statements; 325803108 nanoseconds spent executing 51 JDBC statements; 0 nanoseconds spent executing 0 JDBC batches; 0 nanoseconds spent performing 0 L2C puts; 0 nanoseconds spent performing 0 L2C hits; 0 nanoseconds spent performing 0 L2C misses; 0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections); 0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)}它在初始查询中很好地获取了所有行,但随后开始逐个加载每行的属性,这迫使大量往返于我们的数据库。有两种不同的方法可以解决这个问题。 首先,我们可以编写自己的自定义 JPQL 查询,执行 LEFT JOIN FETCH,就像我在之前的一篇文章中写的那样。 或者,我们可以指示 Hibernate 批量加载子对象,而不是使用单独的 SELECT 语句。 @Entity@Table(name = "customer_order_line")public CustomerOrderLineEntity { @Id @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "customer_order_line_id_seq") @SequenceGenerator(name = "customer_order_line_id_seq", sequenceName = "col_id_seq", allocationSize = 40) private Long id; ... @BatchSize(size = 50) @OneToMany(mappedBy = "customerOrderLine", cascade = CascadeType.ALL, fetch = FetchType.LAZY, orphanRemoval = true) private Set attributes = new HashSet<>();}通过这个配置,我们得到: Hibernate: select coe1_0.id, coe1_0.customer_email, coe1_0.customer_name, coe1_0.customer_reference, l1_0.customer_order_id, l1_0.id, l1_0.article_id, l1_0.delivered_quantity, l1_0.line_number, l1_0.quantity, l1_0.unit_price, coe1_0.order_timestamp from customer_order coe1_0 left join customer_order_line l1_0 on coe1_0.id=l1_0.customer_order_id where coe1_0.id=?Hibernate: select a1_0.customer_order_line_id, a1_0.attr_key, a1_0.attr_value from customer_order_line_attribute a1_0 where a1_0.customer_order_line_id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)2024-04-03T16:06:50.831+02:00 INFO 3307516 --- [employee-service-v1] [nio-8080-exec-1] i.StatisticalLoggingSessionEventListener : Session Metrics { 239367578 nanoseconds spent acquiring 1 JDBC connections; 0 nanoseconds spent releasing 0 JDBC connections; 4376500 nanoseconds spent preparing 2 JDBC statements; 194662592 nanoseconds spent executing 2 JDBC statements; 0 nanoseconds spent executing 0 JDBC batches; 0 nanoseconds spent performing 0 L2C puts; 0 nanoseconds spent performing 0 L2C hits; 0 nanoseconds spent performing 0 L2C misses; 0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections); 0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)}我们不再向数据库发送 51 个查询,现在只需要两个查询,这既节省了时间,又节省了系统资源。
0 阅读:0

哥们看看码农

简介:感谢大家的关注