KVR Audio

tony tony chopper · Post by **tony tony chopper** » Sun Apr 24, 2011 8:26 pm

see if you can write noise into, then memcpy 1920x1600x32 at 60fps using the cpu.

why 60FPS? That's Vista's compositor that's constantly refreshing that way, not XP.

why should different windows be used? random memory accesses. more windows = less efficiency. worst case, 1x1 pixel windows with their memory at random non-linear locations.

I'm pretty sure that there's A LOT more CPU wasted into the handling of windows rather than in the blitting. Even invalidating a lot of handled controls results in a very complex region full of little rectangles.

if you draw anything complex, precache it. period.

That's a waste of good RAM these days, because Vista & above do this for you. Made sense in XP, possibly makes sense for children windows (but still a waste IMHO), possibly makes sense if the compositor is disabled, but does not make sense otherwise. Move a window over this browser window. No, the browser window isn't getting any paint message, because Vista's DWM has simply moved some window coordinates and used its internal window content cache (as textures for a couple of polys). That's why the DWM eats quite some memory.

Again, if you're in Vista or above, and didn't disabled the (nice) DWM, then NO, a top-level window isn't invalidated whenever you move another top-level window over it. That was in XP, time to move on.

Also: when you have something complex to refresh, check your invalidation rectangle(s), and optimize by only redrawing what has to be.

Really I don't know where you read that paint messages should be handled fast, it's quite the opposite. There's a reason for Windows to delay those special messages to when there's time for it. It's quite handy in a sequencer, since it gives less importance to the GUI. Lower priority for the GUI thread along with delayed invalidation, that's good for performances (sure the GUI will leave trails around as it's not refreshed, but that's still better than a slow GUI forced to repaint too frequently).

we have gigs of memory sitting unused in a majority of machines.

we don't have unlimited RAM in a 32bit process

aciddose · Post by **aciddose** » Mon Apr 25, 2011 3:43 am

no you only have 4gb of that.

so, you're saying using up a third of a percent of your available memory is a huge problem? how much space does the bitmap for your knob strip take up? assuming you have one, 40x40, 100 frames (way too few), half a meg. you should probably save this horrible volume of memory by implementing a renderer and redrawing from scratch any graphics on the screen as needed. i'm sure that's the only bitmap you use also, is that correct? you don't have backgrounds, fonts, buttons, lighting effects, sliders, switches, various types of knobs, leds, different types of displays and so on. how silly of me to think you'd have any of those things with your console-based interfaces.

makes sense in xp? "handles it for you" ? so you're saying nobody uses xp anymore, right?\

2011 Win7 Vista Win2003 WinXP W2000 Linux Mac
March 34.1% 7.9% 0.9% 42.9% 0.2% 5.1% 8.0%

that's a wc3 report, but take a look at the claim this image makes:

http://arstechnica.com/microsoft/news/2 ... -share.ars

(in case you can't work this out on your own, or don't want to think about it: this shows a linear prediction of 24% per year. growth is likely sub-linear by far. growth is likely limited by both vista->win7 migrations and the age of machines running other versions of windows. it's unlikely large numbers of new xp machines will be upgraded with no obvious benefits and a $300+ price tag for the user.)

also, before things are side-tracked in excess - this is based upon your claim that "it's probably software anyway" - where, on vista? again, 10%. at most. even so, if they're using an os that has such a massive fault, they're at fault.

tony tony chopper · Post by **tony tony chopper** » Mon Apr 25, 2011 12:50 pm

so, you're saying using up a third of a percent of your available memory is a huge problem? how much space does the bitmap for your knob strip take up?

Actually I already made my point about that several times, huge knob strips (especially for big knobs, which grow exponentially in size as bigger knobs also need more frames) are a bad practice as well. Mine are mostly algorithmic, which doesn't waste memory and allows them to animate more smoothly/precisely.
I also use 8bit images when the degradation isn't visible. And 24bit when no alpha is needed.

Why do I care? Because I also make a sequencer and I understand that any plugin I work on is not the center of the world, but just 1 plugin in the process, sharing the memory with other plugins & audio data.

fonts

even XP caches them

makes sense in xp? "handles it for you" ? so you're saying nobody uses xp anymore, right?\

I'm saying that XP is no more the standard, no more the platform to optimize for.
I'm pretty sure that that chart would show the same for CPU's, a lost of people must be still using 10-years old computers. Now imagine the same chart but only showing your potential market, those who will buy something (now if you do freeware that may make sense).
I know I still support XP & non-SSE2 (we ditched non-SSE1 last year as it seemed pretty safe these days), but that doesn't mean everything is optimized for XP & non-SSE2, it just "works on them".
I'm pretty sure that your chart among readers here would already be very different.
Actually: here's one for the US. Pretty different no? How do you explain that now Vista+7 are now the majority?

And remember, we're in the audio world, meaning that those 14% Mac OS is normally a lot bigger here. So your generic charts..

his is based upon your claim that "it's probably software anyway" - where, on vista? again, 10%. at most. even so, if they're using an os that has such a massive fault

no idea what you meant. What "is probably software"? What massive fault? 10% what?

aciddose · Post by **aciddose** » Mon Apr 25, 2011 12:56 pm

the original argument you made against what i'd originally said regarding blits.

... Maybe just the AlphaBlend function would have to be accelerated ...

the basic blit is almost definitely accelerated on all platforms, otherwise applications would have redraw problems. sure, static updates can be cached by the os and thank god they've finally done that in win7 - but applications still need to fill in that cache in some way and any application that wants to do more advanced compositing than what windows provides with it's severely limited alphablit function need to manage their own caches instead. the additional hardware bitmap cache is another layer that speeds things up even more.

if we were to "recycle" data in the cache from the hardware, latency would be so bad things would become worse than using system ram instead the whole time.

i'm not talking about a mere few simple alpha blits - while that's nice, if you do your own rendering and compositing like i do, you should know that redrawing over and over is insanity with anything reasonably advanced.

i use rotoblits and crossfades to get smooth rotation out of reduced frames - but that isn't practical in software on my 3.2ghz processor if you're moving more than a few knobs. sse is great, but still if you don't have to do the redraws in the first place, why bother?

think about this: in order to get smooth rotation of a 3d knob, you must use crossfades, not rotoblits. 2d is fine, but anything with perspective can't be rotated in 2d. in order to maintain sharpness you need to both use gamma correct interpolation and sharpness control while applying the blend.
can you render this in your code without caching it?

if you are, you have my respect because you're some kind of software rendering uber-god.

also, how could you accomplish additive compositing including alpha compositing into a window-manager handled transparent window as in the screenshot? windows doesn't provide functions for this. (then, port the code to mac osx and get pixel-perfect identical results...) remember that windows' alphablit requires pre-multiplied data, so the extra step to premul the update rects needs to be applied in between as well if you want hardware accelerated transparency, shadows and so on to include your own dynamically generated transparent and composite graphics.

my library uses a pixel-shading stack to accomplish more complex operations without hand coding everything. i suppose you can pre-calculate every step one pixel at a time and then use a putpixel() to avoid some kind of caching, but the cpu hit would be ridiculous.

think "photoshop" style layers and blending effects.

tony tony chopper · Post by **tony tony chopper** » Mon Apr 25, 2011 1:24 pm

the basic blit is almost definitely accelerated on all platforms

GDI is not accelerated in Vista (said Microsoft), thus most likely not the basic GDI blit either. That's on the paper and I haven't checked, it's not something you could check since it wouldn't be noticable (even if you believe it would). We're really talking about simple copies here. I'm sure the real problem is more the access to the VRAM, and if you were accelerating the blit, it would still be the same problem (I mean, you still have to put it into VRAM for it to be accelerated, so what's the point? It's only useful if you're heavily manipulating the texture you just uploaded).

but applications still need to fill in that cache in some way and any application that wants to do more advanced compositing than what windows provides with it's severely limited alphablit function need to manage their own caches instead.

an app may indeed decide to cache lengthy stuff if it has light controls overlaying complex stuff (and I would restrict this to stuff like playback cursors over a song playlist maybe), but that still has nothing to do with having to process wm_paint messages as fast as possible.

if we were to "recycle" data in the cache from the hardware

you don't, you just reprocess it
(I agree that reading from VRAM is a bad practice)

you should know that redrawing over and over is insanity with anything reasonably advanced

my controls don't overlap and I don't see many cases where stuff can end up being refreshed "over & over", unless the content changes, and thus it obviously has to be reprocessed.

Why would your controls be refreshed over & over?

i use rotoblits and crossfades to get smooth rotation out of reduced frames - but that isn't practical in software on my 3.2ghz processor if you're moving more than a few knobs.

image rotation looks like shit anyway, it's good for games that are constantly animated, but your knob will stay frozen in a rotation and it's pretty ugly, blurry at best with the best antialiasing. I prefer the filled mask solution, less freedom but it's precise.

think about this: in order to get smooth rotation of a 3d knob, you must use crossfades, not rotoblits.

I use crossfades too (actually my knobs are 1 mask for the outer circle [which also gives full freedom on the color of the ring, nice], maybe 8 frames for the inner rotating part as I chose to make it repeat from a dent to the next, and then alphablending to blend in-between 2 frames. It's fast enough, but even if it was slow: I still don't see what could happen for them to be constantly refreshed, unless they're being moved, and thus have to be repainted anyway).
I can think of 2 reasons for your stuff to be refreshed while their content didn't change:
-the user moved the window out of the screen and back in, and it's not a top-level window but a child of another window. Does that happen so often & why?
-the user hid the window & showed it again. This happens often. However, caching hidden windows as well would be a pure waste of memory, because how many hidden windows can you get?

aciddose · Post by **aciddose** » Mon Apr 25, 2011 1:40 pm

i smooth movement out, there is about 200ms quick movement, and 800ms more "creeping" movement when you adjust a control. for example using the mouse wheel the knobs don't jump into position, they smoothly rotate, and including the blending, gamma correction and sharpness filter this looks absolutely amazing compared to most GUIs.

elements overlap - my LEDs have a glow which can overlap other controls. this glow is actually a lens-flare that occurs on the surface of your eye when you look at a real LED. i assume your monitor has an LED for the power indicator - it's best if it's on a black background. look directly at the LED, then cover the portion of the "glow" with your finger. you'll notice it appears _identical_ because the glow is actually on the surface of your eye.

using a few frames between knob movements is ok _if_ it repeats so often. unfortunately for the "moog" style knobs with seven-spline like in my screenshot, or "chicken head" style knobs, or others with complex pointers for example (mine are actually 3d, you can see the notch as it rotates, including soft-shadow, ambient occlusion, etc) then it isn't practical anymore.

for the center of the knob it might work, but the pointer goes all the way along the edge and also along the outside rim of the knob. also, due to the perspective transform the circular knob actually becomes oval instead which means you can't use two layers of images even if it would work ok with the pointer.

graphics are becoming more photo-realistic and demands are increasing. even i've started to prefer more advanced graphics than a simple circle with a dot in it to the point where if it isn't "as advanced" then it isn't "worth it" to me. it might as well just be a circle with a dot.

i use 25fps for the knob smoothing, not 60fps due to concerns about processing cost. i'd really love to implement the pixel shaders in opengl and get full hardware acceleration for my entire compositing process.

aciddose · Post by **aciddose** » Mon Apr 25, 2011 2:05 pm

i forgot to answer about why i say blits are accelerated. that's because the write to video memory doesn't need to occur immediately. the video driver can cache the data and set it up for dma transfer. the card can then cache it again, and apply the basic compositing as needed.

i know that windows xp doesn't cache normal windows - but it does cache transparent ones set to "WS_EX_COMPOSITED" and a couple other modes seem to trigger various effects - in some cases if you don't write correctly you get what appears to be video memory interference effects where it looks like portions of other windows were stored temporary for some reason.

in vista - do we really care? this was a major microsoft f**k-up. i wouldn't argue with you that maybe it isn't accelerated there, i'll just take your word for it since it seems to be entirely possible from what i've seen or heard of vista. i didn't notice major issues when using vista for a few minutes, but i've only really spent much time with xp or win7 and never applied real tests to vista at all.

if you wait a long time in wm_paint, you'll be delaying the redraw of your window. i'm not sure the specifics of how it works, but by observing it, the appearance is that it uses the background brush to clear your window before it asks you to paint, and otherwise leaves corrupt data there. i assume that it must drop frames if your window is taking too long to respond to the paint message. so on xp, and according to you also on vista this is going to influence how your window is redrawn when another window is moved over-top or the display cache otherwise becomes invalidated.

in win7 this is finally cured by proper caching of window bitmap data and elimination of excessive wm_paint messages. that's awesome, and i can't wait until winxp, vista and such are entirely dead. they're not though, you know that, i know that, and we also both know they won't be below 10% for quite some time, perhaps several years.

so it's throwing out the baby with the bathwater to ignore these issues just because they're solved in win7. again, it's absolutely great if people have it, but not everyone does. having your window sit around for half a second with weird patterns of parts of the other window which dragged over top inside it because it's busy doing a redraw in wm_paint while taking up lots of cpu time isn't really very cool.

tony tony chopper · Post by **tony tony chopper** » Mon Apr 25, 2011 3:01 pm

i smooth movement out, there is about 200ms quick movement, and 800ms more "creeping" movement when you adjust a control. for example using the mouse wheel the knobs don't jump into position, they smoothly rotate, and including the blending, gamma correction and sharpness filter this looks absolutely amazing compared to most GUIs.

I never much liked this. Ohmforce plugins have always done that too, and yes it looks amazing, but it feels like you're not fully controlling the knob. Plus it has to match the internal parameter's own smoothing or it's a little dumb (if the param isn't smoothed for the same length, or at all).
But smoothing when a knob makes a quick jump for other reasons (preset switching) than direct control, yes that looks nice.

elements overlap - my LEDs have a glow which can overlap other controls

sure, glowing is a good reason for elements to overlap a lot. However, what kind of caching can you do here? Surely not a full cache or the glowing control would be in it, so it's the glowing control (or the one behind it) that should have its own smaller local cache, no? (& thus you repaint will still have to repaint a lot of little caches, not a big one)

using a few frames between knob movements is ok _if_ it repeats so often. unfortunately for the "moog" style knobs with seven-spline like in my screenshot, or "chicken head" style knobs, or others with complex pointers for example (mine are actually 3d, you can see the notch as it rotates, including soft-shadow, ambient occlusion, etc) then it isn't practical anymore.

sure, but no one forces you to use those effects. I chose to limit myself to stuff that looks good enough in the range of what's still efficient. It can still look good while avoiding complex shadowing.

graphics are becoming more photo-realistic and demands are increasing

I think people are more demanding but not for photo-realistic stuff. In fact, quite the opposite these days it seems. It might be fading out along with the demand for hardware emulation.

i know that windows xp doesn't cache normal windows - but it does cache transparent ones set to "WS_EX_COMPOSITED"

also "layered" windows (which is the best thing ever added to WinXP, they're pretty much sprites on the desktop)

in vista - do we really care? this was a major microsoft f**k-up.

It's much better than XP IMHO, and for the first time I didn't revert to the old silver GUI as I did in XP (because it was so ugly), it's perfect 3D desktop composition with a nice blur (that sadly you can't control). And you start to understand why it's called "aero" when you use it, having translucent titlebars (+rounded corners) makes you feel like there's more room, even though you can't read something that's blurred behind a glass, you still know it's there.
And when I say Vista I say Win7, which is most likely the same as Vista, only with another name since people never liked Vista for some reason. Only diff is that the GDI is partially accelerated again in Win7 (says Microsoft), but that's a detail (since the Win7 desktop compositor certainly doesn't use GDI, only apps may/will).

if you wait a long time in wm_paint, you'll be delaying the redraw of your window.

which is a -good- thing

i'm not sure the specifics of how it works, but by observing it, the appearance is that it uses the background brush to clear your window before it asks you to paint,

yes, in WM_ERASEBKGND (which I often implement to do nothing at all)

It will leave "ugly" trails in all Windows (even Vista since this has nothing to do with the DWM), but IMHO that's a good thing in the audio world, it's better than forcing the CPU to repaint stuff.

in win7 this is finally cured by proper caching of window bitmap data and elimination of excessive wm_paint messages.

wait, what's different in Vista?

so it's throwing out the baby with the bathwater to ignore these issues just because they're solved in win7.

in *vista* (& above)
I don't even believe that Win7 is anything else than rebranding.

Actually, the slight difference between Vista & Win7's DWM is that in Win7 there is *less* caching (& they bothered because people were complaining about the memory it was eating), thanks to the GDI that's partially accelerated again:
DWM works in different ways based on if it is the Windows 7 DWM or the Windows Vista DWM and if the graphics drivers it is using are WDDM 1.0 or 1.1. Under Windows 7 and with WDDM 1.1 drivers, DWM only writes the program's buffer to the video RAM, even if it is a GDI program, this is because Windows 7 supports (limited) hardware acceleration for GDI[2] and in doing so does not need to keep a copy of the buffer in system RAM so that the cpu can write to it.
(wikipedia)

This means: in Vista, you paint using GDI (to system RAM, so no acceleration, and believe it hardly makes a difference for basic blits), Windows caches this in system RAM, and then sends to VRAM, where it stays there & is reused when moving windows around & stuff. In Win7, the GDI will be able to fill the texture directly.
Caching in VRAM only is perfect since that's memory that was totally unused in XP.

aciddose · Post by **aciddose** » Mon Apr 25, 2011 7:37 pm

caching in vram is far more critical for space concerns than system ram though, so there must be some trade-offs depending on the particular case. i think your concern about spending a meg or two on cache isn't really as critical as you think it is though, once you consider that your audio buffers are also using similar amounts of memory in some cases.

obviously in a host at full-screen you can't cache the full-screen result - that wouldn't make sense. basic backgrounds need to be kept basic to minimize redraw cost because they might take up huge spans of the display.

for smaller elements though as you said you can keep local cache for things that actually need it.

in the case of a plugin window, you need to consider it's size. there isn't much point to keep local cache if the difference between cached vs. non-cached rects is a matter of a few pixels. you might as well in that case cache the full window since you'd only save 10% or whatever it happens to be. so in some cases the whole window _is_ a smaller, local cache in the grand scheme of things.

in my gui layouts, this tends to be the case as i don't waste much space on large blank sections or logos and that sort of thing.

in my code, i've even implemented rect collation and the result of one bitblt call vs. ten or twenty when updating many sections during animation of a meter, for example, was significant. at least here on my 3.2ghz machine (and i assume most people have slower processors) it took a small amount of flicker and reduced it to zero flicker. i'm guessing some blits were happening on one frame, and others on the following frame. i only collate rects that overlap or touch since so far anything related has been at least touching on screen. if more distant elements were part of the same animation however i'd need to probably implement a system to ensure they're all updated at the same blit - otherwise for example leds could light up out of order in a meter, which would look really strange.

when i say "win7" i tend to discount vista because so few people seem to be using it. a large number of new machines in the united states may have been sold with vista for a couple years because another upgrade cycle seems to have occurred at just the wrong (or right, if you wanted to sell vista) time.

yes, if you're doing raw blits the difference between vista and win7 should be zero.

i don't agree when you say "that's good though" about delaying the window's redraw. if you're already executing your draw in response to the wm_paint message and windows is forced to drop your frame and move on because you're taking too long, how is that ultimately useful? it just means you're using way too much cpu time in the wm_paint message - which is the reason i said in the first place it's a good idea to minimize time spent there. (to avoid such cpu issues and dropped frames leading to flicker or garbage in your window)

tony tony chopper · Post by **tony tony chopper** » Mon Apr 25, 2011 9:00 pm

in the case of a plugin window, you need to consider it's size. there isn't much point to keep local cache if the difference between cached vs. non-cached rects is a matter of a few pixels. you might as well in that case cache the full window since you'd only save 10% or whatever it happens to be.

But how will your full cache help? Take the case you have a glowing control overlapping another (the only case where I'd agree it could be useful). You wanna repaint your glowing control, but here you can't take the other control in the back from a cache, since your full cache will be holding both your control & the glowing one, already overalpping. You still have to repaint both, fully.

mystran · Post by **mystran** » Tue Apr 26, 2011 10:23 am

Not doing application level composition into a temporary buffer (DIB etc) is kinda dumb, since the cost is practically none (extra BitBLT, scary!) and you can get rid of plenty of annoying tearing issues (the one that personally has annoyed me for ages is Edison's playback position leaving a "lag trail" when zoomed in scrolling playback especially in spectrum mode if the window is moderately large). If drawing is costly, I see little problem with keeping the DIB (to serve subsequent WM_PAINTs with a BitBLT) until window contents change.

Doing painting outside WM_PAINT is even more retarded though, because this time you can end up doing tons of useless work that you then end up discarding before you ever get a chance to display it (unless you do that outside WM_PAINT too, but it's even more retarded then, for reasons which I won't go here because I don't feel like writing a novel).

So personally I think the solution that is "only slightly dumb" is to check for logical content changes in the beginning of WM_PAINT, do an internal repaint if necessary (into a DIB etc), and then service the paint request with a BitBLT. While you could drop the back-buffer afterwards (to save memory, as if that really was a problem considering how much is wasted of pretty bitmaps etc), I think it's sensible to keep 'em since it means if you get tons of small update regions (eg dragging window) you only do real drawing once, the rest being just BitBLTs.

The parts where it's still slightly dumb is drawing whole window (internally) when only ever getting small update regions for that logical "frame" (depending on painting done this is a "hard" problem to solve) and the part where the DWM actually does the back-buffering as well, but refuses to let you synchronize the updates properly (so you avoid some but not all tearing in GDI apps; in say Direct3D it works perfectly with updates done with Present() even with PRESENTATION_INTERVAL_IMMEDIATE).

tony tony chopper · Post by **tony tony chopper** » Tue Apr 26, 2011 11:48 am

Not doing application level composition into a temporary buffer (DIB etc) is kinda dumb, since the cost is practically none (extra BitBLT, scary!) and you can get rid of plenty of annoying tearing issues (the one that

It's not about using an offscreen buffer, it's about -keeping- a cache running.
Of course we all use offscreen buffers when drawing things that can flicker.

personally has annoyed me for ages is Edison's playback position leaving a "lag trail" when zoomed in scrolling playback especially in spectrum mode if the window is moderately large).

The lag trail is not related. It's still repainted using an offscreen buffer, but (invalidated) twice. This is where I could have used a (non-temporary) cache, as I wrote above, playback indicator are a good use for this, but I chose not to. This also allows me to paint the playback indicator -behind- things (like clips in FL's playlist, which is why there too I don't use caching)

You normally get tearing because of the way the scrolling works. I scroll windows by pixels, the good old way, because it's much faster (less to repaint, Windows just moves the block & invalidates the side), but of course the playback cursor moves along with it, and is invalidated for later, while the scrolling happened immediately.

Doing painting outside WM_PAINT is even more retarded though, because this time you can end up doing tons of useless work that you then end up discarding before you ever get a chance to display it

indeed. Even caching would better be done straight inside wm_paint.

aciddose · Post by **aciddose** » Tue Apr 26, 2011 1:52 pm

you can't animate from wm_paint, it should be called wm_window_invalid_forced_update or something like that.

...because the blits are dma to hardware, and they're cached, it's most efficient to draw and blit from a separate thread so you don't end up blocking window messages while you're fooling around drawing something. that's assuming you're drawing a lot though and for smaller rects probably isn't a big deal.

just that the idea it's "bad" not to draw everything after getting a wm_paint message is pretty insane. i don't know where you come up with this idea.

you'd have to be saying that using gl, directx or any other interface is "bad" also.

regarding how you'd cache overlapped elements - you don't - this isn't the concern here. in vista (although it's not hardware) and win7 this isn't an issue, since "tearing" or corruption of the screen buffer won't happen. in xp though you'll get a lot of corruption and lag in your window if you draw in wm_paint. you'll cause all the other windows to slow down too because you'll be blocking their dispatch call.

back in the days of win3, the system wasn't multi-threaded. if you drew something in wm_paint you'd be locking up the entire system while it was attempting to do a redraw - almost always because someone was moving an icon or window around. although, there was no "display window content while dragging" option until win98 as far as i recall.

aciddose · Post by **aciddose** » Tue Apr 26, 2011 2:13 pm

your comments made me want to double check - and yes, you can not animate from wm_paint. you can call redrawwindow() - but this just directly dispatches a wm_paint message to yourself, which is the same as calling the blit from where ever you already are instead of calling redrawwindow or invalidaterect.

don't believe me? set a breakpoint, call the function, watch the message pass through the dispatcher in your own thread.

i know this stuff because i've written games and software rendering and so on where there is real time, fullscreen graphics. directdraw and direct3d blits were in the past much faster, but that is no longer the case. the only remaining advantage was vsync, but again this is now applied if you set your flags the right way by gdi.

if you go ahead and write a real-time application using gdi, for example a realtime software 3d renderer you'll find that the limiting factor is drawing (access to the pixel buffer) or blits, and until you get to have thousands of polygons it isn't anything to do with the engine internally. even shaders like basic textures or gouraud are faster than memory access to the main pixel buffer, or the blit to the screen.

my point anyway is that if you play with this, you'll quickly realize what is slow vs. fast and all the issues involved.

also; again regarding where you question "but you have to redraw overlapped elements anyway" - obviously you do, but you _can not_ redraw those using gdi because it doesn't have source-alpha-sum blit mode which is required to get the right effect.

for example, your background is a bitmap with alpha information which forms a transparent window. you need to blit text on top in 1/2 transparency. if you blit only color information (as with gdi) or including alpha information (as with gdi) the text is either too transparent, or too opaque. the proper operation is to use the source alpha value to calculate color transparency, then sum the source and destination alphas with saturation to get the alpha value. gdi doesn't provide that operation so it must be done in software.

it's possible you could do multiple blits - but remember the alpha information for the summation would need to be masked away, and the color information would need to be pre-multiplied as well. so the number of additional operations would be so much that you might as well have mixed the buffer yourself, then presented a single blit to gdi.

remember my comment about if you take too long between blits of the same frame, the result will end up on the screen spread across multiple frames. that is one reason you must use as few blits as possible during animation. (not to mention all the overhead a blit adds for gdi)

tony tony chopper · Post by **tony tony chopper** » Tue Apr 26, 2011 2:35 pm

...because the blits are dma to hardware, and they're cached, it's most efficient to draw and blit from a separate thread so you don't end up blocking window messages while you're fooling around drawing something. that's assuming you're drawing a lot though and for smaller rects probably isn't a big deal.

painting from a separate thread? That sounds dumb, especially since the GUI thread is dedicated to that, and since you'll probably be sync-locking all the time. What's wrong in delaying Windows messages anyway? It's not like you were using it for critical audio stuff.

just that the idea it's "bad" not to draw everything after getting a wm_paint message is pretty insane. i don't know where you come up with this idea.

if you get a wm_paint with a tiny invalidation rectangle, why would you prefer to repaint everything? If your source is a composite of lots of items, you can easily skip the items not hitting the invalidation rectangle.

your comments made me want to double check - and yes, you can not animate from wm_paint.

animate?

but this just directly dispatches a wm_paint message to yourself, which is the same as calling the blit from where ever you already are instead of calling redrawwindow or invalidaterect.

redrawing windows is pretty bad, better invalidate it (or part of it) and let Windows deal with the messaging, delaying repaint messages when the message buffer is too crowded.
I never ever grab a DC to paint in it directly.

don't believe me? set a breakpoint, call the function, watch the message pass through the dispatcher in your own thread.

the message.. is the whole point. As well as the region merging, when you invalidate rectangles that intersect.

i know this stuff because i've written games

me too, and you don't refresh a game like you refresh a GUI.. You don't even process a game (cycle-based) like you process a normal app (msg-based, even though the msg loop is kind of a cycle).

VSTGUI 3.6 Windows 7 GDI+ Sluggishness: A Workaround...