2024年7月

SpringBoot2.7 霸王硬上弓 Logback1.3 → 不甜但解渴

作者: wenmo8
时间: 2024-07-30
分类: 其它
评论

开心一刻

一大早，她就发消息质问我
她：你给我老实交代，昨晚去哪鬼混了？
我：没有，就哥几个喝了点酒
她：那我给你打了那么多视频，为什么不接？
我：不太方便呀
她：我不信，和你哥们儿喝酒有啥不方便接视频的？
她：你肯定有别的女人了！
我：你老公就坐在我旁边，我敢接？

前情回顾

SpringBoot2.7还是任性的，就是不支持Logback1.3，你能奈他何
讲了很多，总结下来就两点

SpringBoot 2.7.x 默认依赖 Logback 1.2.x，不支持 Logback 1.3.x

如果强行将 Logback 升级到 1.3.x，启动会报异常

Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/impl/StaticLoggerBinder
	at org.springframework.boot.logging.logback.LogbackLoggingSystem.getLoggerContext(LogbackLoggingSystem.java:304)
	at org.springframework.boot.logging.logback.LogbackLoggingSystem.beforeInitialize(LogbackLoggingSystem.java:118)
	at org.springframework.boot.context.logging.LoggingApplicationListener.onApplicationStartingEvent(LoggingApplicationListener.java:238)
	at org.springframework.boot.context.logging.LoggingApplicationListener.onApplicationEvent(LoggingApplicationListener.java:220)
	at org.springframework.context.event.SimpleApplicationEventMulticaster.doInvokeListener(SimpleApplicationEventMulticaster.java:178)
	at org.springframework.context.event.SimpleApplicationEventMulticaster.invokeListener(SimpleApplicationEventMulticaster.java:171)
	at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:145)
	at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:133)
	at org.springframework.boot.context.event.EventPublishingRunListener.starting(EventPublishingRunListener.java:79)
	at org.springframework.boot.SpringApplicationRunListeners.lambda$starting$0(SpringApplicationRunListeners.java:56)
	at java.util.ArrayList.forEach(ArrayList.java:1249)
	at org.springframework.boot.SpringApplicationRunListeners.doWithListeners(SpringApplicationRunListeners.java:120)
	at org.springframework.boot.SpringApplicationRunListeners.starting(SpringApplicationRunListeners.java:56)
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:299)
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:1300)
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:1289)
	at com.qsl.Application.main(Application.java:15)
Caused by: java.lang.ClassNotFoundException: org.slf4j.impl.StaticLoggerBinder
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 17 more

原因也分析过了

spring-boot-2.7.18 依赖 org.slf4j.impl.StaticLoggerBinder，而 logback 1.3.x 没有该类

SpringBoot 2.7.x 支持 Logback 1.3.x 也不是没办法，但有一些限制，同时也存在一些未知的风险
关于未知的风险，相信大家都能理解，为什么了，这就好比从
JDK8
升级到
JDK 11
，你们为什么不敢升，一个道理，因为大版本的升级，变动点往往比较多，甚至会移除掉低版本的一些内容，编译期报错还算直观的（我们可以根据报错调整代码），如果是运行期报错那就头疼了，上了生产就算事故了，这锅你敢背吗？
所以大版本的升级，意味着我们不但要修复编译期的错，还要进行全方位的测试，尽可能的覆盖所有场景，以排除运行期可能存在的任何异常。业务简单还好，如果业务非常庞大，这个全量测试是要花大量时间的，不仅开发会口吐芬芳，测试也会
mmp
Upgrade to SLF4J 2.0 and Logback 1.4
进行了一些讨论，
wilkinsona
（Spring Boot 目前 Contributor 榜一）就提到了一些风险点

里面讨论了很多，
Logback
的 Contributor 榜一大哥
ceki
也在里面进行了很多说明与答疑，感兴趣的可以详细看看
总之就是：通过调整配置，SpringBoot 2.7.x 可以支持 Logback 1.3.x，但风险需要我们自己承担

换个角度想想，我们应该是能理解
Spring Boot
官方的

对
Logback
不是那么熟，只能通过
Logback
官方说明知道变动点（能保证事无巨细列全了？），若变动点太多，不可能每个点都去核实
Spring Boot
那么庞大，集成了那么多功能，怕是榜一大哥也不能熟记所有细节（我们敢保证对我们负责的项目的所有细节都了如指掌吗），所以也没法评估升级到
Logback 1.3.x
会有哪些点受影响

所以求稳，
Spring Boot 2.x.x
不打算集成
Logback 1.3.x
但是，如果我们也任性一回，非要强扭这个瓜，
Spring Boot
是不是也不能奈我们何？

霸王硬上弓

参考这个，我们来配置下

关闭
Spring Boot
的
LoggingSystem

@SpringBootApplication
public class Application {

    public static void main(String[] args) {
        System.setProperty("org.springframework.boot.logging.LoggingSystem", "none");
        SpringApplication.run(Application.class, args);
    }
}

配置文件用
logback.xml

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <property name="LOG_FILE" value="/logs/spring-boot-2_7_18.log"/>
    <property name="FILE_LOG_PATTERN" value="%d{yyyy-MM-dd HH:mm:ss.SSS}|%level|%t|%line|%-40.40logger{39}:%msg%n"/>

    <!-- 按照每天生成日志文件-->
    <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
            <pattern>${FILE_LOG_PATTERN}</pattern>
        </encoder>
        <file>${LOG_FILE}</file>
        <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
            <fileNamePattern>${LOG_FILE}.%d{yyyy-MM-dd}.zip</fileNamePattern>
            <maxHistory>30</maxHistory>
        </rollingPolicy>
    </appender>

    <!-- 控制台输出 -->
    <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
            <pattern>${FILE_LOG_PATTERN}}</pattern>
        </encoder>
    </appender>

    <root level="${loglevel:-INFO}">
        <appender-ref ref="STDOUT" />
        <appender-ref ref="FILE" />
    </root>
</configuration>

启动确实正常了，我们加点简单的业务日志，发现日志也输出正常

2024-07-26 16:46:48.609|INFO|http-nio-8080-exec-1|525|o.s.web.servlet.DispatcherServlet       :Initializing Servlet 'dispatcherServlet'
2024-07-26 16:46:48.610|INFO|http-nio-8080-exec-1|547|o.s.web.servlet.DispatcherServlet       :Completed initialization in 0 ms
2024-07-26 16:46:48.632|INFO|http-nio-8080-exec-1|23|com.qsl.web.TestWeb                     :hello接口入参：青石路
2024-07-26 16:46:50.033|INFO|http-nio-8080-exec-3|23|com.qsl.web.TestWeb                     :hello接口入参：青石路
2024-07-26 16:46:50.612|INFO|http-nio-8080-exec-4|23|com.qsl.web.TestWeb                     :hello接口入参：青石路
2024-07-26 16:46:51.150|INFO|http-nio-8080-exec-5|23|com.qsl.web.TestWeb                     :hello接口入参：青石路
2024-07-26 16:46:51.698|INFO|http-nio-8080-exec-6|23|com.qsl.web.TestWeb                     :hello接口入参：青石路
2024-07-26 16:46:52.203|INFO|http-nio-8080-exec-7|23|com.qsl.web.TestWeb                     :hello接口入参：青石路

日志文件写入也正常

这不仅解渴，还很甜呀

但不要甜的太早，这仅仅只是一个
demo
：
spring-boot-2_7_18
，没有业务代码，简单的不能再简单了，你们要是以此来判断甜与不甜，那就大错特错了；应用到项目中，不但要保证能够正常启动，还要保证已有的所有业务能够正常运行，至于计划中的业务，那就将来再说，谁知道明天和意外哪个先来，认真过好当下！
初步尝试，是可行的，所以你们大胆的去试吧，但要做好全方位的业务测试

wilkinsona
提到了，关闭
Spring Boot
的
LoggingSystem
后，用的是
Logback
的默认配置，配置文件必须是
logback.xml
而不能是
logback-spring.xml
；虽然榜一大哥的话很权威，但我们主打一个任性，就想来试试
logback-spring.xml
，会有什么样的结果，直接将
logback.xml
改名成
logback-spring.xml
，能启动起来，但有一堆
debug
日志，重点是

日志没有写入文件

wilkinsona
诚不欺我！

原理分析

关闭了
Spring Boot
的
LoggingSystem
后，日志相关的全权交给
Logback
，而关于
Logback
的配置文件加载，我是写过一篇详解的：
从源码来理解slf4j的绑定，以及logback对配置文件的加载
，直接跳到总结部分，有这么一段

编译期间，完成slf4j的绑定以及logback配置文件的加载。slf4j会在classpath中寻找org/slf4j/impl/StaticLoggerBinder.class(会在具体的日志框架如log4j、logback等中存在)，找到并完成绑定；同时，logback也会在classpath中寻找配置文件，先找logback.configurationFile、没有则找logback.groovy，若logback.groovy也没有，则找logback-test.xml，若logback-test.xml还是没有，则找logback.xml，若连logback.xml也没有，那么说明没有配置logback的配置文件，那么logback则会启用默认的配置(日志信息只会打印在控制台)

虽说
Logback
是
1.1.17
，而不是
1.3.14
，但对配置文件的加载应该是没变的

大家注意看我的措辞：应该，这样即使变了，你们也不能说我，因为我说的是应该
保险起见，你们应该去看下 1.3.14 的源码！

这也是为什么配置文件是
logback.xml
的时候，日志能正常写入文件，而是
logback-spring.xml
时候，日志不能写入日志文件的原因，因为
Logback
不认
logback-spring.xml
，
Spring Boot
才认！
至于
Spring Boot LoggingSystem
嘛，等我掌握了再来和你们聊，一定要等我哟

总结

Spring Boot 2.x.x
默认依赖
Logback 1.2.x
，不支持
Logback 1.3.x
，但是通过设置

System.setProperty("org.springframework.boot.logging.LoggingSystem", "none");

启动时不报错的，再结合
logback.xml
，日志是能够正常写入日志文件的；但是保险起见，还是不推荐升级到
Logback 1.3.x

能不动就不要动，改好没绩效，改出问题要背锅，吃力不讨好，又不是不能跑

如果一定要升级，那就做好全量测试，把所有业务场景都覆盖到

优化 GitHub 体验的浏览器插件「GitHub 热点速览」

作者: wenmo8
时间: 2024-07-30
分类: 其它
评论

上周，GitHub 有个“安全问题”——CFOR（Cross Fork Object Reference）冲上了热搜，该问题的表现是：

远程仓库的提交内容任何人可以访问，即使已被删除
。只需要拿到 commit ID+源/Fork 的项目地址，任何人都能访问之前提交到远程仓库的内容。下面有 3 个演示，可以复现该问题：

演示一
：Fork 项目已删除，之前的提交所有人可见，复现步骤如下：

Fork 任意公开的开源项目（源项目）
在 Fork 项目中提交 commit 并推送到远程仓库（push）
记下 commit ID 后删除 Fork 项目
访问源项目，并在地址栏拼接上 commit ID，即可查看之前的提交内容

演示二
：源（上游）项目已被删除，但通过 Fork 项目地址和 commit ID，仍可访问源项目的提交内容。

演示三
：源项目是私有项目，被 Fork（私有）后，源项目设置为公开，此时 Fork 项目中的私有内容可被任何人访问。

对此，GitHub 官方很早之前就回应过，
这些不是 bug，而是故意为之的特性
。既然如此，我们应该如何规避这些安全隐患呢？限于篇幅，详细讨论将在正文中展开。

说回本周的开源热搜项目，第一个开源项目是推荐给 GitHub 产品经理的 refined-github，这是一个来自“民间”的优化 GitHub 使用体验的浏览器插件。一体化的令牌管理平台 Infisical，能够有效地防止令牌和密钥信息泄漏。友好的联邦学习框架 Flower，开箱即用对新手友好。

最后，极简的 GPT-4o 客户端和用 AI 智能批量重命名文件的工具 ai-renamer，都是能帮你提升效率的 LLM 应用神器。

本文目录
- 1. 开源新闻
  - 1.1 防范 CFOR 问题的建议
- 2. 开源热搜项目
  - 2.1 优化 GitHub 体验的浏览器插件：refined-github
  - 2.2 一体化的令牌管理平台：Infisical
  - 2.3 Linux 内核模块编程指南：lkmpg
  - 2.4 友好的联邦学习框架：Flower
  - 2.5 用 AI 批量重命名文件：ai-renamer
- 3. HelloGitHub 热评
  - 3.1 免费的可视化 Web 页面构建平台：GrapesJS
  - 3.2 极简的 GPT-4o 客户端：gpt-computer-assistant
- 4. 结尾

1. 开源新闻

1.1 防范 CFOR 问题的建议

爆出这个问题的文章，原标题是《Anyone can Access Deleted and Private Repository Data on GitHub》，我认为有些夸张，因为必须满足以下条件：

你需要推送改动到远程仓库
你必须知道特定的 commit ID
机密信息本身就不应该推送到远程仓库

但删除/私有内容能在公网访问，这设计确实“反人类”
。如果官方不改进，我们只能规范使用 GitHub 的流程来防止机密泄露，下面是我给出的建议：

不要在项目中放明文的密钥和令牌等机密信息，应该放在本地的环境变量中。
避免直接在 GitHub 网页上进行敏感操作，因为会自动执行 push 操作。
在本地设置 git hook 自动进行泄密检查，从源头控制泄密风险。
在将私有项目开源之前，必须进行脱敏检查。此外，即使开发私有的 Fork 项目，也应加上防止泄密的检查流程。

最后，值得一提的是
git 悬空提交
，它通常用于找回被误删和被 force 掉的提交。如果你曾经“有幸”用过：
git fsck –lost-found
命令，当时的心情大概是劫后余生，感谢这条神命令又救了你一命！

git push 命令不会推送悬空提交。

2. 开源热搜项目

2.1 优化 GitHub 体验的浏览器插件：refined-github

主语言：TypeScript
，
Star：23.8k
，
周增长：200

这是一个简化 GitHub 界面并添加实用功能的开源浏览器插件，它通过移除页面的多余元素让界面和交互更加简洁，新增了空白字符可见、一键合并修复冲突和放弃 PR 中某个文件的所有修改等实用功能，优化了 GitHub 使用体验，支持 Chrome 和 Firefox 浏览器。

GitHub 地址→
github.com/refined-github/refined-github

2.2 一体化的令牌管理平台：Infisical

主语言：TypeScript
，
Star：13.3k
，
周增长：150

该项目可以帮助团队集中管理应用配置和机密信息，防止 API TOKEN、密码和公钥等信息泄漏。它提供了简单的界面、客户端 SDK、命令行工具和 API 接口，方便集中管理并集成进现有的项目和 CI/CD 流程，同时还支持令牌扫描等功能，防止 git 提交时泄密。

GitHub 地址→
github.com/Infisical/infisical

2.3 Linux 内核模块编程指南：lkmpg

主语言：Other
，
Star：7.3k
，
周增长：170

这是一本关于如何为 Linux 内核编写模块的指南，包含了针对最新的 5.x 和 6.x 内核版本的示例。Linux 内核模块是为 Linux 内核添加新功能的一种方法，无需修改内核本身和重启系统，编写此类程序需要具有 C 编程语言基础。

GitHub 地址→
github.com/sysprog21/lkmpg

2.4 友好的联邦学习框架：Flower

主语言：Python
，
Star：4.6k
，
周增长：200

联邦学习是一种分布式的机器学习方法，可以在不共享数据的情况下训练模型。该项目是一个简单易用的联邦学习框架，可与流行的机器学习框架（PyTorch、TensorFlow、JAX 和 scikit-learn 等）结合使用。它支持联邦学习训练、分析和评估，以及模拟客户端运行等功能，包含丰富的示例，适用于需要保护隐私的机器学习模型开发场景，如医疗、政企和金融等。

import flwr as fl
import tensorflow as tf

# Load model and data (MobileNetV2, CIFAR-10)
model = tf.keras.applications.MobileNetV2((32, 32, 3), classes=10, weights=None)
model.compile("adam", "sparse_categorical_crossentropy", metrics=["accuracy"])
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

# Define Flower client
class CifarClient(fl.client.NumPyClient):
  def get_parameters(self, config):
    return model.get_weights()

  def fit(self, parameters, config):
    model.set_weights(parameters)
    model.fit(x_train, y_train, epochs=1, batch_size=32)
    return model.get_weights(), len(x_train), {}

  def evaluate(self, parameters, config):
    model.set_weights(parameters)
    loss, accuracy = model.evaluate(x_test, y_test)
    return loss, len(x_test), {"accuracy": accuracy}

# Start Flower client
fl.client.start_numpy_client(server_address="127.0.0.1:8080", client=CifarClient())

GitHub 地址→
github.com/adap/flower

2.5 用 AI 批量重命名文件：ai-renamer

主语言：JavaScript
，
Star：1.1
，
周增长：200

这是一个 Node.js 写的命令行工具，基于 LLM（Llava、Gemma、Llama 等）实现智能、自动化、批量重命名本地文件。它使用简单、无需人为干预，可根据文件的内容智能重命名文件，支持视频、图片和文件。

GitHub 地址→
github.com/ozgrozer/ai-renamer

3. HelloGitHub 热评

在这个章节，将会分享下本周 HelloGitHub 网站上的热门开源项目，欢迎与我们分享你上手这些开源项目后的使用体验。

3.1 免费的可视化 Web 页面构建平台：GrapesJS

主语言：TypeScript

该项目通过直观的可视化界面，让用户能够通过拖拽的方式，快速设计和构建网站的 HTML 模板。它所见即所得、移动端适配，适用于官网、新闻和 CMS 等类型的网站。

项目详情→
hellogithub.com/repository/572e31f5fc7541efb19c16d331796edf

3.2 极简的 GPT-4o 客户端：gpt-computer-assistant

主语言：Python

该项目是适用于 Windows、macOS 和 Ubuntu 的 GPT-4o 客户端，它拥有极简的用户界面，支持执行多种任务，包括读取屏幕、打开应用、系统音频和文本输入等。

项目详情→
hellogithub.com/repository/4688db1465d5437aab851a70ba39f1e2

4. 结尾

以上就是本期「GitHub 热点速览」的全部内容，希望你能够在这里找到自己感兴趣的开源项目，如果你有其他好玩、有趣的 GitHub 开源项目想要分享，欢迎来
HelloGitHub
与我们交流和讨论。

往期回顾

文件系统(十一)：Linux Squashfs只读文件系统介绍

作者: wenmo8
时间: 2024-07-30
分类: 其它
评论

liwen01 2024.07.21

前言

嵌入式Linux系统中，squashfs文件系统使用非常广泛。它主要的特性是只读，文件压缩比例高。对于flash空间紧张的系统，可以将一些不需要修改的资源打包成压缩的只读文件系统格式，从而达到节省空间的目的。

另外还有个特性就是它可以分块解压缩，使用数据会更加灵活，但同时也会引入读放大的问题。

(一)制作squash文件系统

使用mksquashfs可以将文件及文件夹制作成squash文件系统镜像文件，比如我们要将squashfs-root文件夹打包成squashfs镜像文件，可以使用命令：

mksquashfs squashfs-root squashfs-root.sqsh -comp xz

这里是使用xz压缩方式进行文件压缩

(1)压缩比例测试

squashfs是一个只读压缩的文件系统，我们简单测试一下它的压缩功能

使用/dev/zero生成零数据写入到文件夹squashfs_zero对应的文件中

dd if=/dev/zero of=file1 bs=256K count=1

制作如下测试文件目录及测试文件:

biao@ubuntu:~/test/squashfs/squashfs_zero$ tree
.
├── test1
│   ├── file1
│   ├── file1_1
│   └── file1_2
├── test2
│   ├── file2
│   ├── file2_1
│   └── file2_2
├── test3
│   ├── file3
│   ├── file3_1
│   └── file3_2
└── test4
    ├── file4
    ├── file4_1
    └── file4_2

4 directories, 12 files
biao@ubuntu:~/test/squashfs/squashfs_zero$

文件大小如下：

biao@ubuntu:~/test/squashfs/squashfs_zero$ du -h
1.5M    ./test3
2.1M    ./test2
2.1M    ./test1
1.7M    ./test4
7.3M    .
biao@ubuntu:~/test/squashfs/squashfs_zero$

使用xz压缩方式将squashfs_zero制作成镜像文件

mksquashfs squashfs_zero squashfs_zero.sqsh -comp xz

文件大小如下：

biao@ubuntu:~/test/squashfs$ ll -h squashfs_zero.sqsh 
-rw-r--r-- 1 biao biao 4.0K Jun 26 23:48 squashfs_zero.sqsh
biao@ubuntu:~/test/squashfs$

这里是将7.3M大小squashfs_zero文件夹压缩成了一个4k大小的squashfs_zero.sqsh。当然，这里的测试是非常极端的，因为文件写入的数据都是0，如果写入随机数那压缩比例就会相差非常大了。

(二)squashfs数据分析

(1)数据布局

Squashfs的一个镜像文件它最多包含下面9个部分：
Superblock、Compression options、Data blocks fragments、Inode table、Directory table、Fragment table、Export table、 UID/GID lookup table、Xattr table
。

最多包含的意思，也就是有些部分不是必须的，比如Compression options 部分。

它们在镜像文件中的数据分布如下图：

(2)制作测试镜像文件

使用/dev/urandom 生成随机数写到文件夹squashfs_urandom对应的文件：

dd if=/dev/urandom of=filex bs=10K count=50

制作如下测试文件目录及测试文件：

biao@ubuntu:~/test/squashfs/squashfs_urandom$ tree
.
├── test1
│   ├── file1
│   ├── file1_1
│   └── file1_2
├── test2
│   ├── file2
│   ├── file2_1
│   └── file2_2
├── test3
│   ├── file3
│   ├── file3_1
│   └── file3_2
└── test4
    ├── file4
    ├── file4_1
    └── file4_2

4 directories, 12 files
biao@ubuntu:~/test/squashfs/squashfs_urandom$

squashfs 文件系统的组成部分，大部分也都是压缩的，为了我们后面的数据分析，我们设置
Data blocks fragments、Inode table、Directory table、Fragment table
不进行压缩

制作命令如下：

mksquashfs squashfs_urandom squashfs_urandom.sqsh -comp xz  -noF -noX -noI -noD

(3)查看镜像数据信息

如果要查看squashfs的概要信息，可以使用unsquashfs命令进行查看

unsquashfs -s squashfs_urandom.sqsh

输出内容信息如下:

biao@ubuntu:~/test/squashfs$ unsquashfs -s squashfs_urandom.sqsh 
Found a valid SQUASHFS 4:0 superblock on squashfs_urandom.sqsh.
Creation or last append time Wed Jun 26 23:28:18 2024
Filesystem size 5032.60 Kbytes (4.91 Mbytes)
Compression xz
Block size 131072
Filesystem is exportable via NFS
Inodes are uncompressed
Data is uncompressed
Fragments are uncompressed
Always-use-fragments option is not specified
Xattrs are uncompressed
Duplicates are removed
Number of fragments 2
Number of inodes 37
Number of ids 1
biao@ubuntu:~/test/squashfs$

这里我们可以看到，上面我们设置-no的部分，是没有进行数据压缩的。

(4)Superblock参数分析

Superblock 在镜像文件的最开始位置，大小固定为96个字节，查看数据内容如下：

biao@ubuntu:~/test/squashfs$ hexdump  -s 0 -n 96 -C squashfs_urandom.sqsh 
00000000  68 73 71 73 11 00 00 00  ec 5c 7a 66 00 00 02 00  |hsqs.....\zf....|
00000010  02 00 00 00 04 00 11 00  cb 01 01 00 04 00 00 00  |................|
00000020  ac 02 00 00 00 00 00 00  16 9d 4e 00 00 00 00 00  |..........N.....|
00000030  0e 9d 4e 00 00 00 00 00  ff ff ff ff ff ff ff ff  |..N.............|
00000040  60 98 4e 00 00 00 00 00  2e 9b 4e 00 00 00 00 00  |`.N.......N.....|
00000050  6e 9c 4e 00 00 00 00 00  00 9d 4e 00 00 00 00 00  |n.N.......N.....|
00000060
biao@ubuntu:~/test/squashfs$

对Superblock的数据进行解析

这里我们看到几个比较关键的数据

最开始的4个字节为squashfs的magic，值为hsqs
block size 是每个数据块的最大长度，这里是128KB，squashfs支持的块大小范围是：4KB~1MB
compressor 表示压缩类型，这里的4表示xz压缩，其它还支持GZIP、LZMA、LZO、LZ4、ZSTD 数据压缩格式。
frag count 表示有多少段数据是存储在fragments组块中
最后面是各个table组块的开始位置

(5)inode table数据分析

从superblock中我们知道inode table的开始位置是在0x4e9860位置

biao@ubuntu:~/test/squashfs$ hexdump  -s 0x4e9860 -n 718 -C squashfs_urandom.sqsh     
004e9860  cc 82 02 00 b4 01 00 00  00 00 9b e9 78 66 02 00  |............xf..|
004e9870  00 00 60 00 00 00 ff ff  ff ff 00 00 00 00 00 20  |..`............ |
004e9880  03 00 00 00 02 01 00 20  01 01 02 00 b4 01 00 00  |....... ........|
004e9890  00 00 c3 e9 78 66 03 00  00 00 60 20 03 00 ff ff  |....xf....` ....|
004e98a0  ff ff 00 00 00 00 00 d0  07 00 00 00 02 01 00 00  |................|
004e98b0  02 01 00 00 02 01 00 d0  01 01 02 00 b4 01 00 00  |................|
004e98c0  00 00 cf e9 78 66 04 00  00 00 60 f0 0a 00 ff ff  |....xf....`.....|
004e98d0  ff ff 00 00 00 00 00 80  0c 00 00 00 02 01 00 00  |................|
004e98e0  02 01 00 00 02 01 00 00  02 01 00 00 02 01 00 00  |................|
004e98f0  02 01 00 80 00 01 01 00  fd 01 00 00 00 00 b1 e9  |................|
004e9900  78 66 01 00 00 00 00 00  00 00 02 00 00 00 3a 00  |xf............:.|
004e9910  00 00 11 00 00 00 02 00  b4 01 00 00 00 00 f9 e9  |................|
004e9920  78 66 06 00 00 00 00 00  00 00 00 00 00 00 00 00  |xf..............|
004e9930  00 00 00 78 00 00 02 00  b4 01 00 00 00 00 01 ea  |...x............|
004e9940  78 66 07 00 00 00 00 00  00 00 00 00 00 00 00 78  |xf.............x|
004e9950  00 00 00 18 01 00 02 00  b4 01 00 00 00 00 08 ea  |................|
004e9960  78 66 08 00 00 00 00 00  00 00 01 00 00 00 00 00  |xf..............|
004e9970  00 00 00 f0 00 00 01 00  fd 01 00 00 00 00 ea e9  |................|
.........
.........
biao@ubuntu:~/test/squashfs$

对数据进行分析

这里有几个参数需要注意：

(a)inode_type

inode_type 是inode的类型，数值2表示普通文件，其它类型定义如下：

(b)block_sizes

这里是描述的块的大小(有可能是压缩的)，这个大小需要解析。

为什么有些inode有多个block_sizes呢？这个是因为superblock中定义了一个block的最大值，如果一个文件的大小大于block最大值，那它就存在多个block_sizes。

实际每一个文件都有一个对应的inode,它都是按序分布在inode table中。

(6)directory table 数据分析

从superblock中我们知道directory table的开始位置是在0x4e9b2e位置:

biao@ubuntu:~/test/squashfs$ hexdump  -s 0x4e9b2e -n 320 -C squashfs_urandom.sqsh          
004e9b2e  1c 81 02 00 00 00 00 00  00 00 02 00 00 00 00 00  |................|
004e9b3e  00 00 02 00 04 00 66 69  6c 65 31 28 00 01 00 02  |......file1(....|
004e9b4e  00 06 00 66 69 6c 65 31  5f 31 58 00 02 00 02 00  |...file1_1X.....|
004e9b5e  06 00 66 69 6c 65 31 5f  32 02 00 00 00 00 00 00  |..file1_2.......|
004e9b6e  00 06 00 00 00 b4 00 00  00 02 00 04 00 66 69 6c  |.............fil|
004e9b7e  65 32 d4 00 01 00 02 00  06 00 66 69 6c 65 32 5f  |e2........file2_|
004e9b8e  31 f4 00 02 00 02 00 06  00 66 69 6c 65 32 5f 32  |1........file2_2|
004e9b9e  02 00 00 00 00 00 00 00  0a 00 00 00 34 01 00 00  |............4...|
004e9bae  02 00 04 00 66 69 6c 65  33 68 01 01 00 02 00 06  |....file3h......|
004e9bbe  00 66 69 6c 65 33 5f 31  98 01 02 00 02 00 06 00  |.file3_1........|
004e9bce  66 69 6c 65 33 5f 32 02  00 00 00 00 00 00 00 0e  |file3_2.........|
004e9bde  00 00 00 f8 01 00 00 02  00 04 00 66 69 6c 65 34  |...........file4|
004e9bee  28 02 01 00 02 00 06 00  66 69 6c 65 34 5f 31 60  |(.......file4_1`|
004e9bfe  02 02 00 02 00 06 00 66  69 6c 65 34 5f 32 03 00  |.......file4_2..|
004e9c0e  00 00 00 00 00 00 01 00  00 00 94 00 00 00 01 00  |................|
004e9c1e  04 00 74 65 73 74 31 14  01 04 00 01 00 04 00 74  |..test1........t|
004e9c2e  65 73 74 32 d8 01 08 00  01 00 04 00 74 65 73 74  |est2........test|
004e9c3e  33 8c 02 0c 00 01 00 04  00 74 65 73 74 34 20 80  |3........test4 .|
004e9c4e  60 70 17 00 00 00 00 00  00 90 01 01 00 00 00 00  |`p..............|
004e9c5e  60 a8 4d 00 00 00 00 00  00 f0 00 01 00 00 00 00  |`.M.............|
004e9c6e
biao@ubuntu:~/test/squashfs$

对数据进行解析：

这里最开始是一个directory header结构，它由count、start、inode number组成，它们定义如下：

每个directory header 至少需要携带一个Directory Entry，Directory Entry的定义如下：

这里的inode number 与 inode table 中的inode number是相互对应的

(7)Data blocks fragments 分析

(a)Data blocks

在我们测试的这个镜像文件中，应为使用的是xz压缩方式，属于常规压缩方式，Compression options中不会有描述，也就是说Compression options组成部分是为空。

在Superblock后面紧接着的就是Data blocks数据。

从inode table和dir table我们知道，最开始存储的是inode number为2的file1 文件。

因为我们这里的数据未进行压缩，正常应该是对比镜像文件0x60地址开始的数据与file1文件开始的数据一样的。

(b)fragments

fragments 组块设计的目的是用来存储一些小文件，将它们组合成一个block来存储，还有一种就是前面文件剩余的一小部分数据，也有可能会被存储在fragments组块中。

具体哪些数据存储到了fragments，可以查看fragments table表

(三)squashfs工作原理

(1)挂载文件系统
：

squashfs被挂载的时候，系统首先读取superblock块，获取squashfs基本信息和各个表格的位置。

(2)访问文件或目录
：

系统从 superblock 获取 inode table 和 directory table 的位置。
如果是访问目录，系统查找 directory table，获取目录中每个文件和子目录的名称及其 inode 编号。
通过 inode 编号，从 inode table 获取文件或目录的 inode，了解文件的元数据和数据块位置。
对于小文件或大文件的片段，通过 inode 中的信息查找 fragment table，获取片段的数据位置。

(四)squashfs优缺点

(1)优点

高压缩率
：SquashFS 使用 gzip、lzma、lz4、xz 等压缩算法，能够显著减少文件系统的大小，节省存储空间。

只读特性
：适合用于需要保护数据完整性的环境，如嵌入式系统和操作系统的只读镜像。

高效的随机访问
：SquashFS 支持高效的随机读取访问，适合读取频繁的场景。

碎片处理：
通过 Fragment Table，SquashFS 能有效处理小文件，减少存储碎片，提高存储效率。

存储和性能优化：
支持文件、目录和 inode 的压缩，减少了存储占用和 I/O 操作，提高了性能。

数据完整性
：SquashFS 可以包含校验和，用于确保数据的完整性和防止数据损坏。

(2)缺点

只读特性
：SquashFS 是只读的，不能直接修改文件系统中的文件或目录。这意味着需要更新或更改文件系统时，必须重新生成整个文件系统镜像。

压缩开销
：虽然读取速度较快，但解压缩过程仍然需要一定的 CPU 资源。在低性能的嵌入式系统中，这可能会对系统性能产生一定影响。

内存消耗
：在读取大文件时，解压缩过程可能会消耗大量内存，尤其是在资源受限的嵌入式系统中，这可能会成为一个瓶颈

结尾

上面介绍了squashfs文件系统的数据组成和它们相互工作的原理以及squash文件系统的优缺点。

这里提一个问题：如果根文件系统使用squashfs文件系统，main执行文件也位于根文件系统中，在不考虑双分区备份升级的情况下，要怎么升级根文件系统？

在main程序中直接将新squashfs镜像文件写入到根文件系统所在的mtdblock中是否可以？会不会存在根文件系统更新异常的风险？

-------------------End------------------- 如需获取更多内容请关注 liwen01 公众号

Segment-anything学习到微调系列3_SAM微调decoder

作者: wenmo8
时间: 2024-07-29
分类: 其它
评论

前言

本系列文章是博主在工作中使用SAM模型时的学习笔记，包含三部分：

SAM初步理解，简单介绍模型框架，不涉及细节和代码
SAM细节理解，对各模块结合代码进一步分析
SAM微调实例，原始代码涉及隐私，此部分使用公开的VOC2007数据集，Point和Box作为提示进行mask decoder微调讲解

本篇是第3部分，基于voc2007数据集对SAM decoder进行微调。代码已上传至
github
，如果对你有帮助请点个Star，感谢。

此前讲过，以ViT_B为基础的SAM权重是375M，其中prompt encoder只有32.8k，mask decoder是16.3M(4.35%)，剩余则是image encoder，image encoder是非常大的，一般不对它进行微调，预训练的已经够好了，除非是类似医疗影像这种非常规数据，预训练数据中没有，效果会比较差，才会对image encoder也进行微调，所以此处只针对decoder进行微调。

微调效果

基于point prompt

这部分是只针对point作为提示的微调，借助了
ISAT_with_segment_anything
这个用SAM做自动标注的工具来进行一个效果比对，可以看出来微调前，需要点击多次多个点才能分割得较好，微调后点击一下就能分割出对应类别

微调前

微调后

基于box prompt

这部分加入了box作为提示的微调

微调前

微调后

代码部分

数据读取

使用的是VOC2007分割数据集，总共632张图片（412train_val，210test），一共20个类别，加上背景类一共21，标签是png格式，像素值代表物体类别，同时所有物体mask的外轮廓值是255，训练时会忽略，原始数据集如下目录构造（github上的代码中data_example只是示例，只有几张图），训练使用的是SegmentationObject中的标签：

## VOCdevkit/VOC2007
├── Annotations
├── ImageSets
│   ├── Layout
│   ├── Main
│   └── Segmentation
├── JPEGImages
├── SegmentationClass
└── SegmentationObject

CustomDataset的代码按如上目录结构读取对应数据，根据ImageSets/Segmentation目录下的txt_name指定训练的文件名字，然后读取对应图片和标签，有以下几点注意：

分割标签使用PIL读取，像素值就是对应类别，255是外轮廓会忽略；如果使用opencv读取图片，需要根据RGB值去platte表中看对应类别
image和gt都是按numpy array塞进batch中，后面丢给sam会转为tensor；
voc2007中每张图片大小是不一致的，目前就按batch=1处理
gt的channel是1，后面需要转为one-hot的形式

class CustomDataset(Dataset):
    def __init__(self, VOCdevkit_path, txt_name="train.txt", transform=None):
        self.VOCdevkit_path = VOCdevkit_path
        with open(os.path.join(VOCdevkit_path, f"VOC2007/ImageSets/Segmentation/{txt_name}"), "r") as f:
            file_names = f.readlines()
        self.file_names = [name.strip() for name in file_names]
        self.image_dir = os.path.join(self.VOCdevkit_path, "VOC2007/JPEGImages")
        self.image_files = [f"{self.image_dir}/{name}.jpg" for name in self.file_names]
        self.gt_dir = os.path.join(self.VOCdevkit_path, "VOC2007/SegmentationObject")
        self.gt_files = [f"{self.gt_dir}/{name}.png" for name in self.file_names]

    def __len__(self):
        return len(self.file_names)

    def __getitem__(self, idx):
        image_path = self.image_files[idx]
        image_name = image_path.split("/")[-1]
        gt_path = self.gt_files[idx]

        image = cv2.imread(image_path)
        image = image[..., ::-1] ## RGB to BGR
        image = np.ascontiguousarray(image)
        gt = Image.open(gt_path)
        gt = np.array(gt, dtype='uint8')
        gt = np.ascontiguousarray(gt)

        return image, gt, image_name

    @staticmethod
    def custom_collate(batch):
        """ DataLoader中collate_fn,
         图像和gt都用numpy格式，后面会重新转tensor
        """
        images = []
        seg_labels = []
        images_name = []
        for image, gt, image_name in batch:
            images.append(image)
            seg_labels.append(gt)
            images_name.append(image_name)
        images = np.array(images)
        seg_labels = np.array(seg_labels)
        return images, seg_labels, images_name

图像预处理

取得图像后，直接使用SamPredictor中的预处理方式，会将图片按最长边resized到1024x1024，然后计算image_embedding，这部分很耗时，所以每张图只计算一次，会将结果缓存起来需要的时候直接调用。使用"with torch.no_grad()"保证image encoder部分不需要梯度更新，冻结对应权重

    model_transform = ResizeLongestSide(sam.image_encoder.img_size)
    for epoch in range(num_epochs):
        epoch_loss = 0
        for idx, (images, gts, image_names) in enumerate(tqdm(dataloader)):
            valid_classes = []  ## voc 0,255 are ignored
            for i in range(images.shape[0]):
                image = images[i] # h,w,c np.uint8 rgb
                original_size = image.shape[:2] ## h,w
                input_size = model_transform.get_preprocess_shape(image.shape[0], image.shape[1],
                                                                  sam.image_encoder.img_size)  ##h,w
                gt = gts[i].copy() #h,w labels [0,1,2,..., classes-1]
                gt_classes = np.unique(gt)  ##masks classes: [0, 1, 2, 3, 4, 7]
                image_name = image_names[i]

                predictions = []
                ## freeze image encoder
                with torch.no_grad():
                    # gt_channel = gt[:, :, cls]
                    predictor.set_image(image, "RGB")
                    image_embedding = predictor.get_image_embedding()

Prompt生成

从mask中随机选取一定数量的前景点和背景点，此处默认1个前景点和1个背景点，数量多的话一般保持2:1的比例较好。

mask_value就是对应的类别id，去mask中找出像素值等于类别id的点坐标，然后随机选取点就行。此处还会根据mask算外接矩形（实际上直接读取图片对应的xml标签文件也行），用于后续基于box prompt的finetune。

def get_random_prompts(mask, mask_value, foreground_nums=1, background_nums=1):
    # Find the indices (coordinates) of the foreground pixels
    foreground_indices = np.argwhere(mask == mask_value)
    ymin, xmin= foreground_indices.min(axis=0)
    ymax, xmax = foreground_indices.max(axis=0)
    bbox = np.array([xmin, ymin, xmax, ymax])
    if foreground_indices.shape[0] < foreground_nums:
        foreground_nums = foreground_indices.shape[0]
        background_nums = int(0.5 * foreground_indices.shape[0])
    background_indices = np.argwhere(mask != mask_value)

    ## random select
    foreground_points = foreground_indices[
        np.random.choice(foreground_indices.shape[0], foreground_nums, replace=False)]
    background_points = background_indices[
        np.random.choice(background_indices.shape[0], background_nums, replace=False)]

    ## 坐标点是(y,x)，输入给网络应该是(x,y),需要翻一下顺序
    foreground_points = foreground_points[:, ::-1]
    background_points = background_points[:, ::-1]

    return (foreground_points, background_points), bbox

得到的prompt是一些点的坐标，坐标的x,y是基于原图的，但进入SAM的图片会resized到1024x1024，所以点坐标也需要resize，对应如下代码

    all_points = np.concatenate((foreground_points, background_points), axis=0)
    all_points = np.array(all_points)
    point_labels = np.array([1] * foreground_points.shape[0] + [0] * background_points.shape[0], dtype=int)
    ## image resized to 1024, points also
    all_points = model_transform.apply_coords(all_points, original_size)

    all_points = torch.as_tensor(all_points, dtype=torch.float, device=device)
    point_labels = torch.as_tensor(point_labels, dtype=torch.float, device=device)
    all_points, point_labels = all_points[None, :, :], point_labels[None, :]
    points = (all_points, point_labels)

    if not box_prompt:
        box_torch=None
    else:
        ## preprocess bbox
        box = model_transform.apply_boxes(bbox, original_size)
        box_torch = torch.as_tensor(box, dtype=torch.float, device=device)
        box_torch = box_torch[None, :]

微调代码中可以指定基于哪种prompt进行微调，如果是point和box同时都开，会按一定概率舍弃point或box以取得更好的泛化性（不然推理时只有point或只有box作为prompt效果可能不太好）。最后经过prompt_encoder得到sparse_embeddings, dense_embeddings。

    ## if both, random drop one for better generalization ability
    if point_box and np.random.random()<0.5:
        if np.random.random()<0.25:
            points = None
        elif np.random.random()>0.75:
            box_torch = None
    ## freeze prompt encoder
    with torch.no_grad():
        sparse_embeddings, dense_embeddings = sam.prompt_encoder(
            points = points,
            boxes = box_torch,
            # masks=mask_predictions,
            masks=None,
        )

Mask预测

mask decoder这部分不需要冻结，直接调用mask_decoder推理就行，这里进行了两次mask预测，第一次先预测3个层级的mask然后选出得分最高的一个，将这个mask作为一个mask prompt，并与point prompt、box_prompt一起丢进prompt_encoder得到新的sparse_embeddings, dense_embeddings，再进行第二次mask预测，这次只预测一个mask。相当于先得到粗糙的mask，然后再精修。最后经过后处理nms等得到和原图大小一样的预测mask，一个物体对应一张mask，将多个mask叠起来就得到这张图所有的预测结果predictions。

    ## predicted masks, three level
    mask_predictions, scores = sam.mask_decoder(
        image_embeddings=image_embedding.to(device),
        image_pe=sam.prompt_encoder.get_dense_pe(),
        sparse_prompt_embeddings=sparse_embeddings,
        dense_prompt_embeddings=dense_embeddings,
        multimask_output=True,
    )
    # Choose the model's best mask
    mask_input = mask_predictions[:, torch.argmax(scores),...].unsqueeze(1)
    with torch.no_grad():
        sparse_embeddings, dense_embeddings = sam.prompt_encoder(
            points=points,
            boxes=box_torch,
            masks=mask_input,
        )
        ## predict a better mask, only one mask
        mask_predictions, scores = sam.mask_decoder(
            image_embeddings=image_embedding.to(device),
            image_pe=sam.prompt_encoder.get_dense_pe(),
            sparse_prompt_embeddings=sparse_embeddings,
            dense_prompt_embeddings=dense_embeddings,
            multimask_output=False,
        )
        best_mask = sam.postprocess_masks(mask_predictions, input_size, original_size)
        predictions.append(best_mask)

Loss计算

代码中loss用的是BCELoss加DiceLoss，需要gt和pred的shape一致，都为BxCxHxW的形式，pred是经过sigmoid后的值。

因此需要将gt转为one-hot的形式，即将(batch_size, 1, h, w)转为(batch_size, c, h, w)，c是gt_classes中有的类别个数，即图片中有多少个实例类别。

def mask2one_hot(label, gt_classes):
    """
    label: 标签图像 # (batch_size, 1, h, w)
    num_classes: 分类类别数
    """
    current_label = label.squeeze(1) # （batch_size, 1, h, w) ---> （batch_size, h, w)
    batch_size, h, w = current_label.shape[0], current_label.shape[1], current_label.shape[2]
    one_hots = []
    for cls in gt_classes:
        if isinstance(cls, torch.Tensor):
            cls = cls.item()
        tmplate = torch.zeros(batch_size, h, w)  # （batch_size, h, w)
        tmplate[current_label == cls] = 1
        tmplate = tmplate.view(batch_size, 1, h, w)  # （batch_size, h, w) --> （batch_size, 1, h, w)
        one_hots.append(tmplate)
    onehot = torch.cat(one_hots, dim=1)
    return onehot

另外BCE接受的pred值是logit形式，所以需要将predictions用sigmoid处理，后续loss计算对应如下代码

    gts = torch.from_numpy(gts).unsqueeze(1) ## BxHxW ---> Bx1xHxW
    gts_onehot = mask2one_hot(gts, valid_classes)
    gts_onehot = gts_onehot.to(device)

    predictions = torch.sigmoid(predictions)
    # #loss = seg_loss(predictions, gts_onehot)
    loss = BCEseg(predictions, gts_onehot)
    loss_dice = soft_dice_loss(predictions, gts_onehot, smooth = 1e-5, activation='none')
    loss = loss + loss_dice

权重保存

optimizer默认是AdamW，scheduler是CosineAnnealingLR，这些可以自己修改。最后保存的权重只保存当前loss最小的，而且只保存decoder部分的权重，可以按需修改

if epoch_loss < best_loss:
    best_loss = epoch_loss
    mask_decoder_weighs = sam.mask_decoder.state_dict()
    mask_decoder_weighs = {f"mask_decoder.{k}": v for k,v in mask_decoder_weighs.items() }
    torch.save(mask_decoder_weighs, os.path.join(save_dir, f'sam_decoder_fintune_{str(epoch+1)}_pointbox_monai.pth'))
    print("Saving weights, epoch: ", epoch+1)

全系列完，感谢阅读...

sharding-jdbc 兼容 MybatisPlus的动态数据源

作者: wenmo8
时间: 2024-07-29
分类: 其它
评论

背景：
之前的项目做读写分离的时候用的 MybatisPlus的动态数据做的，很多地方使用的@DS直接指定的读库或者写库实现的业务；随着表数据量越来越大，现在打算把比较大的表进行水平拆分，准备使用 ShardingJDBC实现，但是发现两者配合起来并不是那么顺利，网上大部分文章都是直接把整个Sharding的数据源当成MybatisPlus的一个数据源，那么在原本@DS上面指定的数据源就无法直接使用Sharding的分库等逻辑，所以我研究了一下源码，实现了这一逻辑，给后面有需要的朋友提供一个案例，避免浪费不必要的时间

一. 版本选择

目前ShardingJDBC主要有两个版本，一个是ShardingJDBC早期版本，一个是ShardingSphere项目中的ShardingSphere-JDBC

Sharding-JDBC：Sharding-JDBC 最初由当时的项目发起人在2016年发布。它最早作为一个轻量级的 JDBC 层解决方案，旨在解决数据库分片和读写分离的问题。
ShardingSphere：ShardingSphere 项目是由 Sharding-JDBC 项目发展而来的，并在2018年正式发布。Apache ShardingSphere 致力于构建更为完整的分布式数据库管理生态系统，包含了 Sharding-JDBC、Sharding-Proxy 和 Sharding-Sidecar等多个组件。

目前独立的ShardingJDBC已经停更，使用到的最多的版本是 4.1.1

<dependency>
    <groupId>org.apache.shardingsphere</groupId>
    <artifactId>sharding-jdbc-spring-boot-starter</artifactId>
    <version>4.1.1</version>
</dependency>

ShardingSphere项目目前一直处于更新迭代中，ShardingSphere-JDBC 是通过ShardingJDBC 更新迭代过来的，在原有代码的基础进行了一些优化和新功能加入，对于开发者而言，主要是参数的配置发生了一些调整。但是参数的作用和配置方式和以前一样；

这里我为了方便以后会使用到新特性，我直接使用的是 ShardingSphere-JDBC 5.2.1

官方帮助文档
：
https://www.bookstack.cn/read/shardingsphere-5.1.0-zh/ecf18b21ab3f559c.md

<dependency>
	<groupId>org.apache.shardingsphere</groupId>
	<artifactId>shardingsphere-jdbc-core-spring-boot-starter</artifactId>
	<version>5.2.1</version>
</dependency>

二. 项目依赖

案例全部的 Maven依赖如下：

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
        </dependency>

        <dependency>
            <groupId>com.zaxxer</groupId>
            <artifactId>HikariCP</artifactId>
            <version>3.4.5</version>
        </dependency>

        <dependency>
            <groupId>com.baomidou</groupId>
            <artifactId>mybatis-plus-boot-starter</artifactId>
            <version>3.4.0</version>
        </dependency>

        <!-- 读写分离 -->
        <dependency>
            <groupId>com.baomidou</groupId>
            <artifactId>dynamic-datasource-spring-boot-starter</artifactId>
            <version>3.3.2</version>
        </dependency>

        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>8.0.33</version>
        </dependency>

        <!--Shardingjdbc-->
        <dependency>
            <groupId>org.apache.shardingsphere</groupId>
            <artifactId>shardingsphere-jdbc-core-spring-boot-starter</artifactId>
            <version>5.2.1</version>
            <exclusions>
                <exclusion>
                    <groupId>org.yaml</groupId>
                    <artifactId>snakeyaml</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <!-- 添加正确版本的 SnakeYAML  shardingsphere-jdbc里面的依赖版本有问题，会报错-->
        <dependency>
            <groupId>org.yaml</groupId>
            <artifactId>snakeyaml</artifactId>
            <version>1.33</version>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>

三. 参数配置

application.yml 配置

server:
  port: 8080

mybatis-plus:
  mapper-locations: classpath*:mybatis/*.xml
  type-aliases-package: com.game.sharding.dto
  configuration:
    map-underscore-to-camel-case: false
    log-impl: org.apache.ibatis.logging.stdout.StdOutImpl

spring:
  application:
    name: sharding-jdbc-test
  sharding-sphere:
    datasource:
      names: master,write,read,read2
      master:
        type: com.zaxxer.hikari.HikariDataSource
        driver-class-name: com.mysql.cj.jdbc.Driver
        jdbc-url: jdbc:mysql://127.0.0.1:3306/game_dev?characterEncoding=utf-8&useSSL=false&allowMultiQueries=true&autoReconnect=true&serverTimezone=Asia/Shanghai
        username: root
        password: 123456
      write:
        type: com.zaxxer.hikari.HikariDataSource
        driver-class-name: com.mysql.cj.jdbc.Driver
        jdbc-url: jdbc:mysql://127.0.0.1:3306/game_dev?characterEncoding=utf-8&useSSL=false&allowMultiQueries=true&autoReconnect=true&serverTimezone=Asia/Shanghai
        username: root
        password: 123456
      read:
        type: com.zaxxer.hikari.HikariDataSource
        driver-class-name: com.mysql.cj.jdbc.Driver
        jdbc-url: jdbc:mysql://127.0.0.1:3306/game_dev_read?characterEncoding=utf-8&useSSL=false&allowMultiQueries=true&autoReconnect=true&serverTimezone=Asia/Shanghai
        username: root
        password: 123456
      read2:
        type: com.zaxxer.hikari.HikariDataSource
        driver-class-name: com.mysql.cj.jdbc.Driver
        jdbc-url: jdbc:mysql://127.0.0.1:3306/game_dev_read?characterEncoding=utf-8&useSSL=false&allowMultiQueries=true&autoReconnect=true&serverTimezone=Asia/Shanghai
        username: root
        password: 123456
    rules:
      sharding:
        tables:
          team_msg:
		  ## 这里的customer-ds是下面配置的读写分离的数据源名称
            actual-data-nodes: customer-ds.team_msg_${0..1}
            table-strategy:
              standard:
                sharding-column: id
                sharding-algorithm-name: msg-id # 对应下面的sharding-algorithms
        sharding-algorithms:
          ## 注意这里名称(例如msg-id)不能用下划线，会加载不了下面的参数导致启动报错
          msg-id:
            type: INLINE
            props:
			## 使用id取模算法
              algorithm-expression: team_msg_${id % 2}
	 ## 读写分离相关		  
      readwrite-splitting:
        data-sources:
          customer-ds:
            load-balancer-name: customer-lb
            static-strategy:
              write-data-source-name: master
              read-data-source-names: read,read2,write
        load-balancers:
            customer-lb:
			    ## 使用自定义的复杂均衡算法
                type: CUSTOM
    props:
      # 显示处理之后的真实sql
      sql-show: true

四. 代码配置

最关键的配置就是需要把MybatisPlus的数据源注册为使用 shardingsphere-jdbc 的数据源，并且保证数据源的名称和原来MybatisPlus的数据源一致，shardingSphereDataSource里面其实有一个Map保存了application.yml中所有配置的数据源，这里主要是为了方便后续使用@DS做动态数据源切换，所以把同一个ShardingSphere的数据库注册为4个动态数据源，避免使用@DS找不到对应的数据源；

import com.baomidou.dynamic.datasource.DynamicRoutingDataSource;
import com.baomidou.dynamic.datasource.provider.AbstractDataSourceProvider;
import com.baomidou.dynamic.datasource.provider.DynamicDataSourceProvider;
import com.baomidou.dynamic.datasource.spring.boot.autoconfigure.DynamicDataSourceAutoConfiguration;
import com.baomidou.dynamic.datasource.spring.boot.autoconfigure.DynamicDataSourceProperties;
import org.apache.commons.lang3.StringUtils;
import org.apache.shardingsphere.driver.jdbc.adapter.AbstractDataSourceAdapter;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.SpringBootConfiguration;
import org.springframework.boot.autoconfigure.AutoConfigureBefore;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Lazy;
import org.springframework.context.annotation.Primary;

import javax.annotation.Resource;
import javax.sql.DataSource;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;

@Configuration
@AutoConfigureBefore({DynamicDataSourceAutoConfiguration.class, SpringBootConfiguration.class})
public class MyDataSourceConfiguration {

    /**
     * mybatisplus 动态数据源配置项
     */
    @Autowired
    private DynamicDataSourceProperties properties;

    /**
     * shardingjdbc的数据源
     */
    @Lazy
    @Resource(name = "shardingSphereDataSource")
    private AbstractDataSourceAdapter shardingSphereDataSource;

    @Value("${spring.sharding-sphere.datasource.names}")
    private String shardingDataSourceNames;

    /**
     * 注册动态数据源  这里非常关键，因为我们需要用到@DS注解配置动态选择数据源，同上又要让选择的数据源使用shardingjdbc的数据源
     * 所以，这里需要动态的把所有的数据源都注册为  shardingjdbc的数据源
     */
    @Bean
    public DynamicDataSourceProvider dynamicDataSourceProvider() {
        if (StringUtils.isBlank(shardingDataSourceNames)) {
            throw new RuntimeException("配置 spring.sharding-sphere.datasource.names 不能为空");
        }
        String[] names = shardingDataSourceNames.split(",");
        return new AbstractDataSourceProvider() {
            @Override
            public Map<String, DataSource> loadDataSources() {
                Map<String, DataSource> dataSourceMap = new HashMap<>();
                Arrays.stream(names).forEach(name -> dataSourceMap.put(name, shardingSphereDataSource));
                return dataSourceMap;
            }
        };
    }

    /**
     * 将动态数据源设置为首选数据源
     */
    @Primary
    @Bean
    public DataSource dataSource(DynamicDataSourceProvider dynamicDataSourceProvider) {
        DynamicRoutingDataSource dataSource = new DynamicRoutingDataSource();
        dataSource.setPrimary(properties.getPrimary());
        dataSource.setStrict(properties.getStrict());
        dataSource.setStrategy(properties.getStrategy());
        dataSource.setProvider(dynamicDataSourceProvider);
        dataSource.setP6spy(properties.getP6spy());
        dataSource.setSeata(properties.getSeata());
        return dataSource;
    }
}

五. 自定义ShardingSphere中的复杂均衡算法

shardingsphere中的负载均衡需要实现ReadQueryLoadBalanceAlgorithm接口并在getType方法中返回自定义的算法名称，官方自带的又RoundRobinReadQueryLoadBalanceAlgorithm，RandomReadQueryLoadBalanceAlgorithm等，这里我们必须自定义算法才能兼容@DS注解实现自由切换数据源；

ShardingSphere使用的是SPI机制加载的，对应的加载源码部分如下：

所以如果我们要让自定义的ReadQueryLoadBalanceAlgorithm类生效，需要在项目中的 META-INF的services文件夹中创建org.apache.shardingsphere.readwritesplitting.spi.ReadQueryLoadBalanceAlgorithm 文件，并且把自定义的类填入该文件中

源码中的配置如下：

那么我们按照源码的配置直接在自己的项目中创建即可

最后自定义的CustomLoadBalanceAlgorithm 实现


public class CustomLoadBalanceAlgorithm implements ReadQueryLoadBalanceAlgorithm {
    private Properties props;

    public CustomLoadBalanceAlgorithm() {

    }

    @Override
    public void init(Properties props) {
        this.props = props;
    }

    /**
     * 获取数据源
     *
     * @param name  数据源名称（ShardingJDBC使用的）
     * @param writeDataSourceName 写数据源名称
     * @param readDataSourceNames 所有配置的复杂均衡中读数据源名称
     * @param context 事务上下文对象，可以获取context.isInTransaction() 判断是否需要事务，可通过这个来判断是否使用 写数据源
     * @return java.lang.String
     */
    @Override
    public String getDataSource(String name, String writeDataSourceName, List<String> readDataSourceNames, TransactionConnectionContext context) {
        // 获取当前MybatisPlus指定的数据源
        String dsKey = DynamicDataSourceContextHolder.peek();
        if (StringUtils.isNotBlank(dsKey)) {
            if (writeDataSourceName.equals(dsKey)) {
                return dsKey;
            }
            if (readDataSourceNames.contains(dsKey)) {
                return dsKey;
            }
            throw new RuntimeException("@DS 配置错误，当前数据源[" + dsKey + "]不在SharingJDBC数据源列表[" + readDataSourceNames + "]中");
        }
        return writeDataSourceName;
    }

    @Override
    public String getType() {
        return "CUSTOM";
    }

    @Override
    public boolean isDefault() {
        return true;
    }

    @Override
    @Generated
    public Properties getProps() {
        return this.props;
    }
}

那么此时你的ShardingSphere就已经完全适配之前MybatisPlus动态数据源了

六. 源码

Gitee:
https://gitee.com/luowenjie98/sharing-sphere-mybatisplus-demo

如果觉得对你有帮助，请给我点一个star，非常感谢！