Skip to content

Latest commit

 

History

History
334 lines (243 loc) · 8.45 KB

File metadata and controls

334 lines (243 loc) · 8.45 KB

Node.js Worker Threads – In Depth

1. Why Worker Threads Exist

  • Node.js is single-threaded for JavaScript execution.
  • I/O operations are non-blocking (async), but CPU-heavy operations (loops, cryptography, data processing) block the event loop.
  • Worker Threads allow you to run JavaScript in parallel threads, keeping the main thread responsive.

Use Cases in Industry:

  • Banking/Finance: Risk calculation, policy pricing, fraud detection
  • Image/video processing: Transcoding, filters, compression
  • Data analytics: Large dataset transformations
  • AI/ML preprocessing: Feature extraction before sending to ML models
  • Gaming: Physics or game engine calculations

Interview Tip: Expect questions like “How do you handle CPU-intensive operations in Node.js without blocking the event loop?” — Worker Threads + Worker Pools is the expected answer.


2. Raw Worker Threads Implementation

Node.js has a built-in worker_threads module.

worker.js (Worker Thread Code):

const { parentPort } = require('worker_threads');

function heavyComputation(limit) {
  let sum = 0;
  for (let i = 0; i < limit; i++) sum += i;
  return sum;
}

// Listen to messages from the main thread
parentPort.on('message', (limit) => {
  const result = heavyComputation(limit);
  parentPort.postMessage(result);
});

server.js (Main Thread):

const { Worker } = require('worker_threads');
const express = require('express');
const app = express();

app.get('/compute', (req, res) => {
  const worker = new Worker('./worker.js');

  worker.postMessage(1e8); // Send computation limit
  worker.on('message', (result) => res.send(`Sum: ${result}`));
  worker.on('error', (err) => res.status(500).send(err.message));
});

app.listen(3000, () => console.log('Server running on port 3000'));

Notes:

  • Each worker is a separate thread with its own memory.
  • Main thread can continue handling other requests.

3. Worker Pool Concept

Creating a new Worker per request is expensive. Instead, maintain a pool of worker threads:

  • Tasks are queued.
  • Workers execute tasks as they become available.
  • Reduces thread creation overhead.

A. Using Raw Worker Pool

A simple raw pool implementation:

1. worker.js The CPU-heavy task

// worker.js
const { parentPort } = require('worker_threads');

// Heavy computation function
function heavyComputation(limit) {
  let sum = 0;
  for (let i = 0; i < limit; i++) sum += i;
  return sum;
}

// Listen for messages from main thread
parentPort.on('message', (limit) => {
  const result = heavyComputation(limit);
  parentPort.postMessage(result); // Send result back
});

2 workerPool.js – Worker Pool Implementation

// workerPool.js
const { Worker } = require('worker_threads');

class WorkerPool {
  constructor(file, size = 4) {
    this.size = size;
    this.workers = [];
    this.freeWorkers = [];
    this.queue = [];

    // Initialize worker threads
    for (let i = 0; i < size; i++) {
      const worker = new Worker(file);
      worker.on('message', (result) => {
        worker._resolve(result);
        this.freeWorkers.push(worker);
        this._next();
      });
      worker.on('error', (err) => {
        // Handle worker errors
        if (worker._reject) worker._reject(err);
        this.freeWorkers.push(worker);
        this._next();
      });

      this.workers.push(worker);
      this.freeWorkers.push(worker);
    }
  }

  runTask(data) {
    return new Promise((resolve, reject) => {
      const worker = this.freeWorkers.pop();
      if (worker) {
        worker._resolve = resolve;
        worker._reject = reject;
        worker.postMessage(data);
      } else {
        // Queue task if all workers are busy
        this.queue.push({ data, resolve, reject });
      }
    });
  }

  _next() {
    if (this.queue.length === 0 || this.freeWorkers.length === 0) return;
    const { data, resolve, reject } = this.queue.shift();
    this.runTask(data).then(resolve).catch(reject);
  }

  // Optional: close all workers
  close() {
    for (const worker of this.workers) {
      worker.terminate();
    }
  }
}

module.exports = WorkerPool;

server.js – Express Server Using WorkerPool

// server.js
const express = require('express');
const WorkerPool = require('./workerPool');
const os = require('os');

const app = express();

// Create a worker pool with number of threads = CPU cores
const pool = new WorkerPool('./worker.js', os.cpus().length);

app.get('/compute', async (req, res) => {
  try {
    const limit = 1e8; // Example CPU-intensive task
    const result = await pool.runTask(limit);
    res.send(`Sum: ${result}`);
  } catch (err) {
    res.status(500).send(err.message);
  }
});

// Health check endpoint
app.get('/', (req, res) => res.send('Server is running!'));

app.listen(3000, () => {
  console.log('Server running on port 3000');
});

Key Points:

  • Queue ensures tasks don’t fail if all workers are busy.
  • Free workers are reused.
  • Simple yet effective for interview discussion.

B. Using Piscina

Piscina is a modern worker pool library optimized for performance and TypeScript support.

worker.js:

module.exports = function heavyComputation(limit) {
  let sum = 0;
  for (let i = 0; i < limit; i++) sum += i;
  return sum;
};

server.js:

const Piscina = require('piscina');
const express = require('express');

const pool = new Piscina({ filename: './worker.js' });
const app = express();

app.get('/compute', async (req, res) => {
  const result = await pool.runTask(1e8);
  res.send(`Sum: ${result}`);
});

app.listen(3000);

Why Piscina?

  • Handles queueing, idle workers, and scaling automatically.
  • Supports transferable ArrayBuffers for high-performance data.
  • Recommended in modern Node.js for interviews or production.

C. Using Workerpool Library

workerpool is another library, simpler than Piscina.

worker.js:

const workerpool = require('workerpool');

function heavyComputation(limit) {
  let sum = 0;
  for (let i = 0; i < limit; i++) sum += i;
  return sum;
}

workerpool.worker({
  heavyComputation
});

server.js:

const workerpool = require('workerpool');
const express = require('express');

const pool = workerpool.pool('./worker.js');
const app = express();

app.get('/compute', async (req, res) => {
  const result = await pool.exec('heavyComputation', [1e8]);
  res.send(`Sum: ${result}`);
});

app.listen(3000);

Pros of workerpool:

  • Easy API.
  • Supports callbacks and promises.
  • Simple for small to medium projects.

4. Interview Focus Points

  • Explain why Node.js needs Worker Threads for CPU-bound tasks.

  • Difference between libuv thread pool vs worker threads.

  • Use of Worker Pools to avoid thread creation overhead.

  • Libraries like Piscina are modern standard, often preferred over hand-written pools.

  • Combination with clustering:

    • Each Node process (cluster) can have its own worker pool.
    • Maximizes CPU usage across cores.
  • Performance considerations:

    • Don’t spawn too many threads (memory overhead).
    • Keep workers stateless.
    • Monitor CPU/memory.

5. Summary Table

Concept Raw Worker Threads Worker Pool (Raw) Piscina / Workerpool
Ease of use Medium Medium High
Handles queueing Manual Manual Automatic
Supports transferable data Yes Can be manual Yes
Recommended for Small tasks, learning Medium tasks Large-scale production
Interview-friendliness Good Excellent Excellent

TL;DR:

  • Worker Threads → Parallel JS for CPU-heavy tasks.
  • Raw Worker Pool → Manual reuse of threads.
  • Piscina / Workerpool → Modern, production-ready worker pools.
  • Always combine clustering + worker pool for CPU-intensive Node apps.
  • Understanding these is highly relevant for senior Node.js interviews.

I can also make a full architecture diagram showing Node main thread + clusters + worker threads + worker pool + load balancer, which is perfect to explain in an interview or for designing modern apps.

Do you want me to create that diagram?