luajit

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

anybody here having any experience with luajit?

i'm playing around with lua (using luajit) in a vst plugins, and generally, things looks really promising.. i'm a complete noob when it comes to lua, but i'm learning.. :-) my next step now, is looking into audio processing with lua..

in the various process() callbacks, we receive pointers to buffers with samples to fill with our own data, and/or read from.. so, how can i make these buffers available to the lua side, directly, using ffi, without copying (and converting) the buffers.. ??

i recently read this quote (don't remember where):

"You should definitely look at LuaJIT FFI, which allows you to cast a (light)userdata (a simple pointer opaque to Lua) to anything you defined using ffi.cdef. I use it to create a memory buffer (userdata) in C, call an external C++ module (some GPU processing using CUDA) to fill it with data, and return it back to Lua. Then in Lua I cast the userdata to a structure pointer and process it further in Lua. .. No copies are made using casting, and I would not have memory for such copies, as I am dealing with hundreds of megabytes of data."

a (confusing/meaningless) screenshot:

Image

Post

That screenshot demonstrates not needing to copy buffers. Read through the code and output carefully to understand what is happening.

Post

did a little more reading and experimentation..
this seems to work quite well:

c/c++:

Code: Select all

virtual void  on_processBlock(float** AInputs, float** AOutputs, uint32 ASize) {
  lua_getglobal(MState,"on_processBlock");
  lua_pushlightuserdata(MState,AInputs);
  lua_pushlightuserdata(MState,AOutputs);
  lua_pushinteger(MState,ASize);
  lua_pcall(MState,3,0,0);
}
lua:

Code: Select all

function on_processBlock(inputs,outputs,size)
  local in0 = ffi.cast("float**",inputs)[0]
  local in1 = ffi.cast("float**",inputs)[1]
  local out0 = ffi.cast("float**",outputs)[0]
  local out1 = ffi.cast("float**",outputs)[1]
  for i=0,size-1 do
    out0[i] = in0[i] * gain_left
    out1[i] = in1[i] * gain_right
  end
end
so i can actually start making vst plugins in lua now :-)

the (vst) plugin loads a lua script with the same filename (with the file extension replaced with .lua), and from the same directory as the plugin, .. it then compiles that, and the plugin starts calling lua functions when needed..

the plugin itself is made with my own library/framework, and should be fully portable.. so, soon i'll compile both 32 and 64 bit versions, for both windows and linux..

Post

Nice!

That seems a lot like something i worked on, and i definitely wanted to use LuaJit as well, however i never got around to it. How is the speed of Lua using a little bit more complicated rendering examples - usable for dsp?

Post

Mayae wrote:How is the speed of Lua using a little bit more complicated rendering examples - usable for dsp?
i'm not sure, since i actually don't know much at all about lua :-D.. i'm learning, though.. but some performance charts and blog posts look really promising:

a post from 2010, that shows results not far off from gcc.. it has probably become a bit better since then (the results are for an early beta).. link

"..On my laptop, the C implementation multiplies two 1000×1000 matrices in 2.0 seconds (BTW, 1.4 sec if I use float; 0.9 if SSE is used; 26.8 sec without matrix transpose), LuaJIT-jit in 2.3 seconds.." link

"..With no optimization or buffering, and loading every pair of samples individually into a table (which seems like it’d be inefficient and dumb), the LuaJIT version is faster than the C version!" link

but for me, the most interesting thing is the rapid-prototyping possibilities, and how easy you can throw together some specific utility plugin if you need it.. the same script would work everywhere, as long as the wrapper plugin is ported to that platform.. you can also call other shared libraries (dll/so) almost directly, and easily intergrate lua code with your own c/c++ code.. the possibilities are endless :-)

Post

[deleted.. double post]

Post

Okay, nice.
tor.helge.skei wrote:but for me, the most interesting thing is the rapid-prototyping possibilities, and how easy you can throw together some specific utility plugin if you need it.. the same script would work everywhere, as long as the wrapper plugin is ported to that platform.. you can also call other shared libraries (dll/so) almost directly, and easily intergrate lua code with your own c/c++ code.. the possibilities are endless :-)
Yes, i agree - we share the same vision :) check out the project in my sig (audio programming environment), it's virtually the same (only it's only frontend currently is in C, in can be extended to any language - like Lua)

Post

Mayae wrote:check out the project in my sig (audio programming environment), it's virtually the same (only it's only frontend currently is in C, in can be extended to any language - like Lua)
nice!!
too bad i can't try it out (i'm on linux) :-/

you're using tcc? i tried that too, but it seems like tcc has some issues with 64bit linux shared libraries (the -fPIC part, i think).. a standalone binary (exe) worked great, though.. so i quickly changed focus, and looked at luajit instead.. i need to experiment a bit more with it, i think..

Post

i did some performance testing with the lua wrapper, with quite encouraging results!
first i made a lo-fi, mono, simplistic pitch shifter, with a main-loop like this:

Code: Select all

function on_processBlock(inputs,outputs,size)
  local in0 = ffi.cast("float**",inputs)[0]
  local in1 = ffi.cast("float**",inputs)[1]
  local out0 = ffi.cast("float**",outputs)[0]
  local out1 = ffi.cast("float**",outputs)[1]
  for i=0,size-1 do
    local in_ = (in0[i] + in1[i]) * 0.5
    buffer[ math.floor(in_pos) ] = in_
    in_pos = math.floor(in_pos+1) % math.floor(len_)
    local gain = math.min(out_pos/fade_,1)
    local out_ = buffer[ math.floor(base+out_pos)      % math.floor(len_) ] * gain 
               + buffer[ math.floor(fade_base+out_pos) % math.floor(len_) ] * (1-gain)
    out_pos = out_pos + inc_
    if out_pos >= (len_-1-fade_) then
      fade_base = base + len_ - 1 - fade_
      out_pos = 0
      base = in_pos
    end
    out0[i] = out_
    out1[i] = out_
  end
end
i compiled the plugin in debug mode.. with lots of debugging stuff, print statements, etc.. then i inserted this plugin on a track with a small audio loop playing, and duplicated the plugin 10 times.. the cpu meter barely moved, so i started duplicating this track.. 10 times.. so, now i had 100 plugins running.. it took a second or so for the jit-compilation to settle down, but after that, the cpu meter hovered around 25%.. cool! then i decided to stress the system a bit more.. selected all ten tracks, and duplicated that.. 200 plugins.. but that froze my desktop :-/ don't know if it was the jit-ing that froze it, or if it became too much for the cpu.. i rebooted almost immediately..

anyway, 100 lua scripts running at 25% is not too bad!!
soon i'll try to optimize the lua code, and compile a release-build of the vst plugin, and do some testing again..

Post

Are they all running on the same core? as I'd imagine they'd all be sharing the same JIT runtime, right? and if so will not be able to fully utilise all cores.

Post

avasopht wrote:Are they all running on the same core? as I'd imagine they'd all be sharing the same JIT runtime, right? and if so will not be able to fully utilise all cores.
not sure.. didn't go very deep with the testing.. i used bitwig studio in linux.. 10 tracks.. don't know how bitwig handles multi-core stuff, if it spreads the tracks out among cores, or something.. but i have a quad core, so perhaps that 25%, and crashing/freezing when i went over that, indicates that only one core is being used? hmm... the test plugin (64bit linux vst) is static linked with libluajit.a (v2.0.3, which i compiled myself), and is around 500k (debug build, not stripped).. i'll probably experiment with dynamic linking a little later.. but there's so much else interesting to try first :-D

Post

tor.helge.skei wrote:
Mayae wrote:check out the project in my sig (audio programming environment), it's virtually the same (only it's only frontend currently is in C, in can be extended to any language - like Lua)
nice!!
too bad i can't try it out (i'm on linux) :-/

you're using tcc? i tried that too, but it seems like tcc has some issues with 64bit linux shared libraries (the -fPIC part, i think).. a standalone binary (exe) worked great, though.. so i quickly changed focus, and looked at luajit instead.. i need to experiment a bit more with it, i think..
Oh, pity :( Technically, the source code should actually be (nearly) compatible with any unix kind.. Only i guarded a lot of the unix specific stuff in mac #defines - i will fix that next release

Yes, there's two compilers included, one of which is tcc (the other is configurable through scripts to invoke system compilers like llvm/gcc etc.). But yeah, x64 tcc is quite funky (and has some weird calling conventions that require dirty hacks) - given your results it wouldn't surprise me if luajit outperformed tcc. When i get time, ill look at luajit, got high expectations now :)

e: is it still true for lua that it doesn't have native integers?

Post

Isn't x64 calling convention limited to only cdecl?

Post

Mayae wrote: e: is it still true for lua that it doesn't have native integers?
Sort of.. there is no separate integer type in the language, but the native arrays treat small integers (that is integer values of the numeric type) specially, so in some sense integers are meaningful, they just share the same type (in the language) with other numbers.

IIRC LuaJIT will generate integer code (optimization is known as type narrowing) in various situations when it decides it's the right thing to do (induction variables and such should at least get narrowed). Also if I'm not mistaken, you can use the bitops to do integer arithmetic explicitly.

Post

camsr wrote:Isn't x64 calling convention limited to only cdecl?
x64 calling conventions (different operating systems use slightly different variants) are different from the 32-bit cdecl convention, but yeah, you should normally get the same thing whether you specify cdecl or something else.

Post Reply

Return to “DSP and Plugin Development”