elefant @elefant

Recent searches

Search options

Only available when logged in.

Andrew Zonenberg @azonenberg@ioc.exchange

Is there a standardized data processing term for merging/coalescing variable sized data blocks generated in parallel?

Input: set of N vectors each containing 0...M elements

Output: Single vector containing all elements, 0...M*N in size

Jun 08, 2025, 11:19 AM··Web

0boosts·0favorites

**Marcus Müller** @funkylab@mastodon.social · Jun 8

Jun 8

Marcus Müller @funkylab@mastodon.social

@azonenberg that depends on the order of merging. It could be a *concatenation*, it can be an *interleaving*

**Andrew Zonenberg** @azonenberg · Jun 8

Jun 8

Andrew Zonenberg @azonenberg

@funkylab Consecutive.

The immediate operation is a GPU accelerated level crossing search, so each thread will report a list of all level crossings in a specific sub-region of the input.

But what I really want is a list of all level crossings in the entire buffer so I need a postprocessing step.

**Marcus Müller** @funkylab@mastodon.social · Jun 8

Jun 8

Marcus Müller @funkylab@mastodon.social

@azonenberg if "consecutive" means "all elements from first vector, then all from second vector and so on", that's a concatenation.

**Andrew Zonenberg** @azonenberg · Jun 8

Jun 8

Andrew Zonenberg @azonenberg

@funkylab Yep. I just want to be clear that it's variable length (it's not splatting the entire buffer next to the subsequent one, only the valid/occupied subset of values)

**Marcus Müller** @funkylab@mastodon.social · Jun 8

Jun 8

Marcus Müller @funkylab@mastodon.social

@azonenberg ah I see. It's a sparse-storage kind of thing.

**Andrew Zonenberg** @azonenberg · Jun 8

Jun 8

Andrew Zonenberg @azonenberg

@funkylab The operation in question is un-sparsing it.

Each GPU thread is allocated enough buffer to store a level crossing at every sample, but the real signal may have anywhere from 0 to N level crossings in each block.

I can't see any way around this, if I want to evaluate the level search in parallel I can't know a priori how many other crossings other threads before me in the signal will have found.

So I have to produce sparse intermediate output, but then I need to produce a non-sparse result.

**Andrew Zonenberg** @azonenberg · Jun 8 *

Jun 8 *

Andrew Zonenberg @azonenberg

@funkylab The actual reduction algorithm is pretty simple: make a pass over the results and see how many each thread found, then partition up the final output buffer so I know which range of outputs each range of inputs maps to.

Then I can do a parallel block copy.

The hard part is what to call this, since it's a reusable block I expect to use in other similar algorithms that generate variable-sized data (protocol packets etc) in parallel

**Ben** @0h00000000 · Jun 8

Jun 8

Ben @0h00000000

@azonenberg @funkylab This reminds me of a database problem, but in the opposite direction. But maybe you can think in DB terms? When you have a bunch of sparse row data that you want to join, group, and represent in a 2d table, like by date or something, this would be called a pivoting or a pivot table, you might end up with a bunch of null cells depending on the data.

Maybe there is some language/ideas from graph theory that might apply too?

**JP Sugarbroad** @taral@mastodon.social · Jun 8

Jun 8

JP Sugarbroad @taral@mastodon.social

@azonenberg @funkylab Variable-length concatenation. Two ways to do it: pad, concat, filter; or prefix sum and copy.

**Andrew Zonenberg** @azonenberg · Jun 8

Jun 8

Andrew Zonenberg @azonenberg

@taral @funkylab Yeah I plan to do a sum pass followed by a copy pass. Just a question of what to call it.

"Variable length concatenation" is a mouthful but the most accurate description I've heard so far.

**JP Sugarbroad** @taral@mastodon.social · Jun 8

Jun 8

JP Sugarbroad @taral@mastodon.social

@azonenberg @funkylab I've seen "gather" used for a similar operation in ML contexts.

**AMS** @AMS@infosec.exchange · Jun 8

Jun 8

AMS @AMS@infosec.exchange

@azonenberg They were called Scatter-gather operations in HPC circles back in the day (mid 00's, IBM/BlueGene/MD simulations).

**Andrew Zonenberg** @azonenberg · Jun 8

Jun 8

Andrew Zonenberg @azonenberg

@AMS I would consider this a subset of a gather operation. A gather operation in general is just "collect data from many distributed sources" while I'm specifically talking about concatenation.

Also cool, which generation of BG did you work with? I did a bit of stuff on RPI's BG/L but didn't use the new (BG/Q I think?) system as much. By the time it came around GPUs were on the scene, I did do a fair bit of MD simulation with GPULAMMPS circa 2011.

**AMS** @AMS@infosec.exchange · Jun 8

Jun 8

AMS @AMS@infosec.exchange

@azonenberg BlueGene/P. I did some summer work with folks doing drug docking for the first SARS, I was doing python and C on the interface and visualization, not the real heavy lifting code.

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back