What does Minecraft modding look like?

July 3, 2024

Introduction

Super Factory Manager (SFM) is a Minecraft mod which introduces a programming language for logistical tasks. The mod enables users to move items, fluids, and other resources between inventories with high precision and throughput.

You place cables in the world to connect inventories, followed by a manager block that contains the disk which contains the program.

Caption: A demonstration of the mod moving items between chests
sfm demo.gif

Caption: The in-game code editor

Caption: SFM program

NAME "A simple program"

EVERY 20 TICKS DO
    -- on their own, input statements do nothing
    -- there is no item buffer
    INPUT FROM a

    -- all the magic happens here
    OUTPUT TO b
END

There exists a bug in the mod where the manager suddently 'stops working'.

My leading hypothesis is that my caching logic is at fault. Unfortunately, all attempts at reproducing the bug have failed. The only indicators of its existence are the multitudes of people joining my Discord server to ask why their stuff isn't working. Not good.

Learning programming is frustrating enough without having to consider that you're not the one doing something wrong.

Thus, addressing the bug in the is of the highest priority.

The Update to the Mod

Included in the wave of tiny improvements in the latest latest version of the mod (4.16.0), one feature stands above the others: the logging.

Traditionally, Minecraft has a console that displays the logs from the game, which mods can contribute to. Usually when a mod is being uppity, the logs are the best source of information.

Caption: logs from Minecraft when launched using PrismMC. The game has safely exited.

Things get complicated when playing on a server. Non-admin players cannot see the logs of the server. How am I to get debug information from my users without road-blocks like needing admin assistance?

Thus, each manager block now has its own logging implementation that synchronizes to clients. Players can see the logs regarding their programs, isolated from the concerns of the normal logs of the game.

Caption: class definitions used in my logging

// My thing
public record TranslatableLogEvent(
    Level level, // Log level, e.g. INFO, WARN, ERROR
    Instant instant, // Time of the event
    TranslatableContents contents
)

// From the base game
public class TranslatableContents implements ComponentContents {
   private final String key;
   private final Object[] args;
}

Vanilla Minecraft has helpfully established TranslatableComponent for communicating stuff from the server to the client to be displayed in the user's language of choice. By reusing this class, we easily get the benefits of the game's localization system for user-facing logs.

Caption: an example of using a TranslatableComponent

ConfirmScreen confirmscreen = new ConfirmScreen(
    this::confirmResult,
    Component.translatable("deathScreen.quit.confirm"),
    CommonComponents.EMPTY,
    Component.translatable("deathScreen.titleScreen"),
    Component.translatable("deathScreen.respawn")
);

Game Versions

The process of releasing updates for Super Factory Manager is complicated by the fact that the mod supports multiple versions of Minecraft:

1.19.2
1.19.4
1.20
1.20.1
1.20.2
1.20.3
1.20.4

Changes between versions can be substantial: GUI and capability reworks, Minecraft Forge drama leading to the release of NeoForge, and other mods I interact with not being available on all the versions I support.

To accommodate the slight variations in my code between the versions, I have opted to create a git branch for each version of the game that is supported.

When I work on the mod, I work on the oldest branch (1.19.2) until satisfaction, then I merge the changes to the next branch, going up the version pairs until the latest version has all the changes.

merge
1.19.2 => 1.19.4
1.19.4 => 1.20
1.20 => 1.20.1
etc.

Sometimes, methods I depend on are pulled out from under me, or are made obsolete in these version upgrades.

Caption: my old code

private Button.OnTooltip buildTooltip(LocalizationEntry entry) {
	return (btn, pose, mx, my) -> renderTooltip(
			pose,
			font.split(
					entry.getComponent(),
					Math.max(
							width
							/ 2
							- 43,
							170
					)
			),
			mx,
			my
	);
}

Caption: my new code, leveraging a new base-game method

private Tooltip buildTooltip(LocalizationEntry entry) {
	return Tooltip.create(entry.getComponent());
}

It is interesting to observe how I [fail to] leverage abstractions to minimize the differences between versions. Some things are only visible after jumping between versions, adding another dimension to programming.

Caption: A layered representation of git branches as stacked pages, Aero inspired

Interacting with multiple branches is best accompanied by opening all the versions at once in IntelliJ, requiring you to clone the repo multiple times. This lets you jump around the code on any version without friction, and it helps avoid giving Gradle an aneurysm.

To merge branches from two clones (without needing to push), you can fetch the other repo path, followed by git merge FETCH_HEAD. I made a helper script to automate this. It pauses in the event of merge conflicts, where I switch over to IntelliJ which has great tooling.

TODO: make the script use rebase instead of fast-forward

Release Process

I've created a simple Command Line Interface (CLI) for helping me run my scripts for the release process. I have a folder named “actions” which contains nicely named scripts which can be invoked with no arguments, and I have a entrypoint script that uses fzf to show me the scripts by name to have me choose which to run.

Caption: the PowerShell script I use

# Action loop
while ($true) {
  # Prompt user to select an action
  $action = Get-ChildItem -Path actions `
    | Select-Object -ExpandProperty name `
    | Sort-Object -Descending `
    | fzf --prompt "Action: " --header "Select an action to run"
  if ([string]::IsNullOrWhiteSpace($action)) {
    break
  }

  # Run the selected action
  . ".\actions\$action"
  
  # Leave the action display on the screen for a moment
  # (the action loop clears it with fzf)
  pause
}

Caption: video of the action script in action act.ps1

It takes way too long for the jars folder to open in explorer.exe off-screen here

I present to you the (shortened) instructions I wrote to myself for the release process:

Manual: Bump `mod_version` in gradle.properties
Manual: Commit bump
Action: Propagate changes
Action: Run gameTestServer for all versions
Action: Build
Action: Wipe jars summary dir
Action: Collect jars
Action: Update PrismMC test instances to use latest build output
Action: Update test servers to latest build output
Action: Launch PrismMC
Action: Launch test server

for each version:
    Launch version from PrismMC
    Multiplayer -> join localhost
    Break previous setup
    Build new setup from scratch -- ensure core gameplay loop is always tested
    Validate changelog accuracy
    /stop
    Quit game

Action: Tag
Action: Push all
... upload jars

The Problem

I test mc1.20.3 for problems. No issues found.

Caption: SFM logs working, singleplayer gif

I test mc1.20.4 for problems. The logs are not shoing when playing on a server, but they work in singleplayer.

Caption: SFM logs not working, multiplayer gif

This game update included significant changes to packet handling.

What should happen is that the default text is cleared and some logs should be streamed in.

It works in single player. It does not work when playing on a server.

There is nothing abnormal in the server logs. The client logs, however, reveal the first piece of the puzzle:

Caption: client logs of a stacktrace incriminating my mod

[01:43:56] [Render thread/ERROR] [minecraft/BlockableEventLoop]: Error executing task on Client
java.util.concurrent.CompletionException: io.netty.util.IllegalReferenceCountException: refCnt: 0
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315)
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run$$$capture(CompletableFuture.java:1770) 
	...
	at ca.teamdman.sfm.common.logging.TranslatableLogger.decode(TranslatableLogger.java:56)
	at ca.teamdman.sfm.common.net.ClientboundManagerLogsPacket.handleInner(ClientboundManagerLogsPacket.java:69)

Caption: Jujutsu Kaisen screengrab: “We are the exception!”

IntelliJ helpfully recognizes the stack trace and creates links to jump to the offending code. This brings us to our handleInner method as a possible culprit.

Caption: the logs packet handle methods

public record ClientboundManagerLogsPacket(
        int windowId,
        FriendlyByteBuf logsBuf
) implements CustomPacketPayload {
	...
	// called by game code
	public static void handle(
            ClientboundManagerLogsPacket msg, PlayPayloadContext context
    ) {
        context.workHandler().submitAsync(msg::handleInner);
    }

	public void handleInner() {
		// we are on the client, so we can safely use getInstance() to get the current player
		LocalPlayer player = Minecraft.getInstance().player;
        if (player == null
            || !(player.containerMenu instanceof ManagerContainerMenu menu) // pattern match :D
            || menu.containerId != this.windowId()) {
            SFM.LOGGER.error("Invalid logs packet received, ignoring.");
            return;
        }
		var logs = TranslatableLogger.decode(this.logsBuf);
		menu.logs.addAll(logs);
	}

Caption: the method that decodes multiple log entries

public static ArrayDeque<TranslatableLogEvent> decode(FriendlyByteBuf buf) {
	int size = buf.readVarInt(); // this line throws the error
	ArrayDeque<TranslatableLogEvent> contents = new ArrayDeque<>(size);
	for (int i = 0; i < size; i++) {
		contents.add(TranslatableLogEvent.decode(buf));
	}
	return contents;
}

Caption: ManagerBlockEntity sending a log update packet to a player

MutableInstant hasSince = new MutableInstant();
if (!menu.logs.isEmpty()) {
	hasSince.initFrom(menu.logs.getLast().instant());
}
ArrayDeque<TranslatableLogEvent> logsToSend = logger.getLogsAfter(hasSince);
if (!logsToSend.isEmpty()) {
	// Add the latest entry to the server copy
	// since the server copy is only used for checking what the latest log timestamp is
	menu.logs.add(logsToSend.getLast());

	// Send the logs
	while (!logsToSend.isEmpty()) {
		int remaining = logsToSend.size();
		PacketDistributor.PLAYER.with(player).send(
			ClientboundManagerLogsPacket.drainToCreate(
				menu.containerId,
				logsToSend
			)
		);
		if (logsToSend.size() >= remaining) {
			throw new IllegalStateException("Failed to send logs, infinite loop detected");
		}
	}
}

It's dying when we try to read the number of logs to decode. It's not even an IndexOutOfBoundsException, it's something more sinister.

Caption: Goblin Slayer screengrab: “And there are goblins near there.”

This packet is a little odd, compared to most others. It directly stores a byte buffer object instead of a useful type like Collection<TranslatableLogEvent>.

This is a consequence of the way I batch logs together across multiple packets to avoid hitting max packet length problems.

To properly maximize packet size (to minimize the number of packets), we use an algorithm to convert log entries to individual byte buffers. We add those buffers to the current packet's buffer, and we start a new packet if it would have gone over the byte limit.

This means that the byte-encoding of this data happens earlier than usual; earlier than the packet constructor.

Caption: the packet's encoding and decoding methods

public record ClientboundManagerLogsPacket(
        int windowId,
        FriendlyByteBuf logsBuf
) implements CustomPacketPayload {
	...
	// called by game code
	@Override
    public void write(FriendlyByteBuf friendlyByteBuf) {
        encode(this, friendlyByteBuf);
    }
    public static void encode(
		ClientboundManagerLogsPacket msg,
		FriendlyByteBuf friendlyByteBuf
    ) {
        friendlyByteBuf.writeVarInt(msg.windowId());
        friendlyByteBuf.writeBytes(msg.logsBuf); // forward the bytes
    }
	
	// called by game code
	public static ClientboundManagerLogsPacket decode(FriendlyByteBuf friendlyByteBuf) {
        return new ClientboundManagerLogsPacket(
                friendlyByteBuf.readVarInt(),
                friendlyByteBuf
        );
    }

Did you notice?

In the decode method, we saved a reference to the buffer object we received as a parameter, instead of copying the information to a buffer we own.

We are hunting for some use-after-free-ish IllegalReferenceCountException: refCnt: 0 problem, and this object reuse (borrow) is sketchy as hell.

Caption: Frieren screengrab: “That's what my experience as a mage is telling me.”

Here lies a critical difference between 1.20.3 and 1.20.4: the buffer object is released after the decode call in the later version, before the handle method's async work is invoked.

Getting to this point was pretty straightforward (😭)

The fix should be to make our own buffer object instead of storing a reference to the one we passed in, right?

Caption: the decode method now creates a buffer object

public static void encode(
            ClientboundManagerLogsPacket msg, FriendlyByteBuf friendlyByteBuf
) {
	friendlyByteBuf.writeVarInt(msg.windowId());
	friendlyByteBuf.writeBytes(msg.logsBuf);
}

public static ClientboundManagerLogsPacket decode(FriendlyByteBuf friendlyByteBuf) {
	int windowId = friendlyByteBuf.readVarInt();
	FriendlyByteBuf logsBuf = new FriendlyByteBuf(Unpooled.buffer());
	friendlyByteBuf.readBytes(logsBuf);
	return new ClientboundManagerLogsPacket(
			windowId,
			logsBuf
	);
}

Not quite.

Caption: the client gets disconnected when logs are received gif

Perhaps pre-allocating the buffer will fix that?

Caption: giving the buffer a size

public static void encode(
		ClientboundManagerLogsPacket msg, FriendlyByteBuf friendlyByteBuf
) {
	friendlyByteBuf.writeVarInt(msg.windowId());
	friendlyByteBuf.writeBytes(msg.logsBuf);
}

public static ClientboundManagerLogsPacket decode(FriendlyByteBuf friendlyByteBuf) {
	int windowId = friendlyByteBuf.readVarInt();
	FriendlyByteBuf logsBuf = new FriendlyByteBuf(Unpooled.buffer(friendlyByteBuf.readableBytes()));
	friendlyByteBuf.readBytes(logsBuf, friendlyByteBuf.readableBytes());
	return new ClientboundManagerLogsPacket(
			windowId,
			logsBuf
	);
}

Kinda.

Caption: SFM logs still not working gif

There's an IndexOutOfBoundsException in the logs now.

There were a few more iterations before I arrived at the working version

Further investigation (breakpoints) reveals that the encode method is actually being hit twice for the same packet. This is attributable to the introduction of a game-native packet splitting mechanism in the 1.20.4 update.

Captions: different stack traces both calling encode

The encode method I wrote did not anticipate being called multiple times for the same packet.

Caption: javadoc that tells us we are draining the object

/**
 * Transfers the specified source buffer's data to this buffer starting at
 * the current {@code writerIndex} until the source buffer becomes
 * unreadable, and increases the {@code writerIndex} by the number of
 * the transferred bytes.  This method is basically same with
 * {@link #writeBytes(ByteBuf, int, int)}, except that this method
 * increases the {@code readerIndex} of the source buffer by the number of
 * the transferred bytes while {@link #writeBytes(ByteBuf, int, int)}
 * does not.
 * If {@code this.writableBytes} is less than {@code src.readableBytes},
 * {@link #ensureWritable(int)} will be called in an attempt to expand
 * capacity to accommodate.
 */
public abstract ByteBuf writeBytes(ByteBuf src);

The working solution involves calling a different method to avoid the modifying behaviour.

Caption: the encode method no longer drains the object

public static void encode(
		ClientboundManagerLogsPacket msg, FriendlyByteBuf friendlyByteBuf
) {
	friendlyByteBuf.writeVarInt(msg.windowId());
	friendlyByteBuf.writeVarInt(msg.logsBuf.readableBytes());
	friendlyByteBuf.writeBytes(msg.logsBuf, 0, msg.logsBuf.readableBytes()); // !!!IMPORTANT!!!
	// We use this write method specifically to NOT modify the reader index.
	// The encode method may be called multiple times, so we want to ensure it is idempotent.
}

public static ClientboundManagerLogsPacket decode(FriendlyByteBuf friendlyByteBuf) {
	int windowId = friendlyByteBuf.readVarInt();

	int size = friendlyByteBuf.readVarInt(); // don't trust readableBytes
	// https://discord.com/channels/313125603924639766/1154167065519861831/1192251649398419506

	FriendlyByteBuf logsBuf = new FriendlyByteBuf(Unpooled.buffer(size));
	friendlyByteBuf.readBytes(logsBuf, size);
	return new ClientboundManagerLogsPacket(
			windowId,
			logsBuf
	);
}

Caption: SFM logs working, multiplayer gif

The additional code that encodes the length of the byte buffer technically isn't necessary since we can use the readBytes method to just read the rest of the buffer, but it's better to be explicit about our assumptions.

Perhaps a future change will give us a buffer that is shared between packets, expecting us to only read as much as we wrote. It is good to have some warning in place if our assumptions are violated.

At least everything works now.

Closing Remarks

Attempting to reproduce the resolution process of the bug was tricky, even with git and IntelliJ local history at my disposal. There was a behaviour I could not recreate for a gif that I wasted a lot of time trying for. 😥

Documenting the problem solving process is hard.

My life would have been easier writing this article if I had git commit'd at some key moments. Oh well.

The bug still exists in the mod, but at least now I can tell users to send me their logs.