ioc.exchange is one of the many independent Mastodon servers you can use to participate in the fediverse.
INDICATORS OF COMPROMISE (IOC) InfoSec Community within the Fediverse. Newbies, experts, gurus - Everyone is Welcome! Instance is supposed to be fast and secure.

Administered by:

Server stats:

1.3K
active users

Andrew Zonenberg

Is there a standardized data processing term for merging/coalescing variable sized data blocks generated in parallel?

Input: set of N vectors each containing 0...M elements

Output: Single vector containing all elements, 0...M*N in size

@azonenberg that depends on the order of merging. It could be a *concatenation*, it can be an *interleaving*

@funkylab Consecutive.

The immediate operation is a GPU accelerated level crossing search, so each thread will report a list of all level crossings in a specific sub-region of the input.

But what I really want is a list of all level crossings in the entire buffer so I need a postprocessing step.

@azonenberg if "consecutive" means "all elements from first vector, then all from second vector and so on", that's a concatenation.

@funkylab Yep. I just want to be clear that it's variable length (it's not splatting the entire buffer next to the subsequent one, only the valid/occupied subset of values)

@azonenberg ah I see. It's a sparse-storage kind of thing.

@funkylab The operation in question is un-sparsing it.

Each GPU thread is allocated enough buffer to store a level crossing at every sample, but the real signal may have anywhere from 0 to N level crossings in each block.

I can't see any way around this, if I want to evaluate the level search in parallel I can't know a priori how many other crossings other threads before me in the signal will have found.

So I have to produce sparse intermediate output, but then I need to produce a non-sparse result.

@funkylab The actual reduction algorithm is pretty simple: make a pass over the results and see how many each thread found, then partition up the final output buffer so I know which range of outputs each range of inputs maps to.

Then I can do a parallel block copy.

The hard part is what to call this, since it's a reusable block I expect to use in other similar algorithms that generate variable-sized data (protocol packets etc) in parallel

@azonenberg @funkylab This reminds me of a database problem, but in the opposite direction. But maybe you can think in DB terms? When you have a bunch of sparse row data that you want to join, group, and represent in a 2d table, like by date or something, this would be called a pivoting or a pivot table, you might end up with a bunch of null cells depending on the data.

Maybe there is some language/ideas from graph theory that might apply too?

@azonenberg @funkylab Variable-length concatenation. Two ways to do it: pad, concat, filter; or prefix sum and copy.

@taral @funkylab Yeah I plan to do a sum pass followed by a copy pass. Just a question of what to call it.

"Variable length concatenation" is a mouthful but the most accurate description I've heard so far.

@azonenberg @funkylab I've seen "gather" used for a similar operation in ML contexts.

@azonenberg They were called Scatter-gather operations in HPC circles back in the day (mid 00's, IBM/BlueGene/MD simulations).

@AMS I would consider this a subset of a gather operation. A gather operation in general is just "collect data from many distributed sources" while I'm specifically talking about concatenation.

Also cool, which generation of BG did you work with? I did a bit of stuff on RPI's BG/L but didn't use the new (BG/Q I think?) system as much. By the time it came around GPUs were on the scene, I did do a fair bit of MD simulation with GPULAMMPS circa 2011.

@azonenberg BlueGene/P. I did some summer work with folks doing drug docking for the first SARS, I was doing python and C on the interface and visualization, not the real heavy lifting code.