Jump to content

Example of file locking to avoid read/write conflicts in asynchronous code (JavaScript)

From PASTE Wiki

In a single-process application with asynchronous IO, file locking/queueing is necessary to prevent race conditions and corrupted data when reading and writing files.

This is because there is no built-in locking or atomicity in Node's filesystem functions, so the following sequence of events is possible:

1. Event A fires, triggering a file write. writeFile (A) is called and Node starts writing to disk.

2. Event B fires, triggering another writes to the same file. writeFile is called again (B), before write A has returned.

3. The file ends up corrupted, with some data from write A and some from write B.

Corruption can also occur when a write occurs during a read:

1. Some code starts reading a file.

2. Another event fires and starts writing to the file while it's being read.

3. The code that called the read receives a mixture of the file's contents with and without the data written in step 2.

This means that both reads and writes must be queued. Simultaneous reads do not need to be queued, as reads do not affect the file's contents, and in fact in this example multiple read calls can be served with the same underlying promise. Whenever a write is performed, however, it must wait until any in-progress reads or writes have completed before starting, and reads must also wait until any in-progress writes have completed.

Note that we do not need to implement locking in the multiprocessing sense, because JavaScript uses a single-threaded event loop model for processing. This means that updates to the queue's state are atomic, avoiding the kind of fine-grained race conditions we would have to account for in a multi-process environment.

As a simple optimisation, as mentioned above, if there is already a read in the queue and the code calls read() again, it just gets the existing promise.

This queueing pattern is illustrated in the following example (untested):

// One queue per file path

let queues = {};

let api = {
	async read(path) {
		/* Reads can share a task, as there's no point reading the same data twice */
		let existingTask = queues[path]?.find((task) => task.type === "read");
		if (existingTask) {
			return existingTask.promise;
		}

		let task = {
			type: "read",
			promise: promiseWithMethods(),
			inProgress: false,
		};

		api.getQueue(path).push(task);
		api.checkQueue(path);
		return task.promise;
	},

	async write(path, data) {
		let task = {
			type: "write",
			data,
			promise: promiseWithMethods(),
			inProgress: false,
		};
		api.getQueue(path).push(task);
		api.checkQueue(path);
		return task.promise;
	},

	async checkQueue(path) {
		let queue = queues[path];
		let task = queue[0];
		if (task.inProgress) {
			// If there is a task in progress, we'll recur once it's done
			// to process the next task
			return;
		}
		// We have a task waiting and it hasn't been started yet; start it
		task.inProgress = true;
		try {
			if (task.type === "read") {
				task.promise.resolve(await api._read(path));
			} else {
				task.promise.resolve(await api._write(path, task.data));
			}

			// Finally clause below is executed before any other promise

			// Callbacks are called/awaits resumed
		} catch (e) {
			task.promise.reject(e);
		} finally {
			queue.shift();

			if (queue.length > 0) {
				api.checkQueue(path);
			} else {
				delete queues[path];
			}
		}
	},

	async _read(path) {
		return (await fs.readFile(path)).toString();
	},

	async _write(path, data) {
		return await fs.writeFile(path, data);
	},
};

/* util - Promise that can be resolved/rejected from outside */

function promiseWithMethods() {
	let resolve;
	let reject;

	let promise = new Promise(function (res, rej) {
		resolve = res;
		reject = rej;
	});

	promise.resolve = resolve;
	promise.reject = reject;
	return promise;
}

One subtlety that has to be considered when writing the logic above is the order in which promise callbacks are called, or awaits are resumed, when promises are settled. Fortunately, in JavaScript this works intuitively: callbacks are called in the order they're added, which means that code immediately following the original await of a promise is executed before any awaits or then()s that are added subsequently. This means that from the perspective of code outside of checkQueue, there is never a state where a task is complete but hasn't been removed from the queue yet.