Optimizer bug or threading bug?

DSP, Plug-in and Host development discussion.
stratum
KVRAF
1839 posts since 29 May, 2012

Post Thu May 17, 2018 2:05 pm

When a local variable value inside a loop is printed the code works correctly, if it is not printed, it does not. The value (or anything related) is not being accessed by any other thread, only its copy (a tbb::atomic<double> variable set by the same thread) is being read by another thread. What would you think? File a bug report to Microsoft? Only happens with /O2 optimization (/O1 and /Od, and gcc/linux versions are all OK) :?
~stratum~

mystran
KVRAF
4981 posts since 12 Feb, 2006 from Helsinki, Finland

Re: Optimizer bug or threading bug?

Post Thu May 17, 2018 6:43 pm

stratum wrote:When a local variable value inside a loop is printed the code works correctly, if it is not printed, it does not. The value (or anything related) is not being accessed by any other thread, only its copy (a tbb::atomic<double> variable set by the same thread) is being read by another thread. What would you think? File a bug report to Microsoft? Only happens with /O2 optimization (/O1 and /Od, and gcc/linux versions are all OK) :?
Would be interesting to see the actual code, as there as many subtle pitfalls when it comes to multi-threading and it's easy to blame the compiler, only to realise later than it's actually just a simple user mistake. If printing the value (or rather more likely just calling another non-inlined function) fixes the problem, then it's probably something to do with the compiler keeping the variable in registers and the function call happens to make it flush it into memory.

Try putting a _ReadWriteBarrier() call (ie. a full compiler memory fence) in place of the print-call and see if that fixes the problem. If it does, then your code probably expects stricter memory ordering guarantees that what you are currently getting out of your use of tbb::atomic; that's supposed to guarantee some memory ordering, but maybe not enough?
If you'd like Signaldust to return, please ask Katinka Tuisku to resign.

camsr
KVRAF
6857 posts since 17 Feb, 2005

Re: Optimizer bug or threading bug?

Post Thu May 17, 2018 6:57 pm

Go through this list of O2 flags and eliminate them with the -fno- prepend (ex. -fno-gcse).
https://gcc.gnu.org/onlinedocs/gcc/Opti ... tions.html
Image

Miles1981
KVRian
1355 posts since 26 Apr, 2004 from UK

Re: Optimizer bug or threading bug?

Post Fri May 18, 2018 6:06 am

Run address sanitizer on your code!

camsr
KVRAF
6857 posts since 17 Feb, 2005

Re: Optimizer bug or threading bug?

Post Fri May 18, 2018 7:55 am

What compiler are you using?
Image

stratum
KVRAF
1839 posts since 29 May, 2012

Re: Optimizer bug or threading bug?

Post Fri May 18, 2018 8:27 am

Hi,

Thanks you your replies. I'm not able to reproduce the problem with a shorter code extracted from the actual one, so for the time being this looks like a threading or memory corruption problem. _ReadWriteBarrier has no effect.
The compiler is Visual studio 2017, version 15.7.1 and the problem appears only with optimization option /O2.
~stratum~

camsr
KVRAF
6857 posts since 17 Feb, 2005

Re: Optimizer bug or threading bug?

Post Fri May 18, 2018 8:55 am

This MSVC specific pragma may help you to locate the block of code causing the issue (by switching optimization form or disabling them)
https://msdn.microsoft.com/en-us/library/chh3fb0k.aspx
Image

stratum
KVRAF
1839 posts since 29 May, 2012

Re: Optimizer bug or threading bug?

Post Fri May 18, 2018 9:17 am

Interestingly disabling optimizations only for the function that works incorrectly solves the problem as in

Code: Select all

#pragma optimize( "", off )  
void buggy_function()
{
}
#pragma optimize( "", on)
~stratum~

stratum
KVRAF
1839 posts since 29 May, 2012

Re: Optimizer bug or threading bug?

Post Fri May 18, 2018 9:55 am

The same process had a deadlock bug in a related thread pool implementation that appeared only with /O2 optimizations and disabling optimizations with the pragmas above for the implementation of the whole thread pool class code solved the deadlock bug too.
All this looks a bit suspicious because the Visual studio team has been making very frequent updates recently but I'm not able to reproduce the bugs with shorter code fragments.
~stratum~

stratum
KVRAF
1839 posts since 29 May, 2012

Re: Optimizer bug or threading bug?

Post Fri May 18, 2018 11:11 am

Here is something similar

https://developercommunity.visualstudio ... gives.html

My own code also had a variable whose calculation was optimized out (or perhaps written over due to a memory bug as I cannot prove this to be a compiler bug)
~stratum~

camsr
KVRAF
6857 posts since 17 Feb, 2005

Re: Optimizer bug or threading bug?

Post Fri May 18, 2018 1:08 pm

The way to avoid having the optimizer eliminate unused results is to just store the result, like in an allocated address. Don't expect stack variables to not be eliminated if they are unused.
Image

stratum
KVRAF
1839 posts since 29 May, 2012

Re: Optimizer bug or threading bug?

Post Fri May 18, 2018 1:39 pm

camsr wrote:The way to avoid having the optimizer eliminate unused results is to just store the result, like in an allocated address. Don't expect stack variables to not be eliminated if they are unused.
This is exactly what I do, and what happens is much weirder than that.

p.s. I don't really believe that this code makes sure that there is no memory corruption around last_percent as the compiler might reorder these variables and assign some of them to registers, but still I think a stack corruption or something like that would probably destroy other variables also.

p.s.2 The code below is simplified version of the actual version it's not the same but this doesn't matter because the problem is not reproducible anywhere other than its actual execution context anyway.

p.s.3 None of the related values below are actually initialized to zero. Zero is a very strange result value for this code.

Code: Select all

// problem: some other thread reads m_Percentage and when optimizations are enabled m_Percentage value stays fixed at zero because last_percent value does not change
// and stays fixed at zero in spite of the fact that prog_info.percentComplete does change from 0 to 100 and there is no evidence of memory corruption around last_percent.
// note: the value of m_Percentage set by the constructor is 50.
#pragma optimize( "", off )  
void Worker::WaitTask(int taskIdent)
{
	TimerClass timer;
	ProgressInfo prog_info;
	prog_info.percentComplete = 70;// that's intentionally initialized to a nonzero value to expose a bug
	std::string electric_fence1("K");// we need this to understand whether there is memory corruption around last_percent 
	double last_percent = 40; // that's intentionally initialized to a nonzero value to expose a bug
	std::string electric_fence2("L");// we need this to understand whether there is memory corruption around last_percent 
	TaskStatus last_status(TASKSTATCODE_NOTSTARTED);
	try
	{
		do
		{
			m_LastError = m_SoapClient->GetTaskStatus(taskIdent, prog_info);
			if (FailCodes(m_LastError) != FAILCODE_SUCCESS)
			{
				m_LastStatus = STATUS_FAILED;
				break;
			}
			else
			{
				// BEGIN I think optimizer messes something here
				std::cerr << electric_fence1;
				last_status = prog_info.status;
				if (last_status == TASKSTATCODE_FINISHED ||
					last_status == TASKSTATCODE_RUNNING)
				{
					last_percent = prog_info.percentComplete;
					if (last_percent < 0)
						last_percent = 0;
					else if (last_percent > 100)
						last_percent = 100;
					m_Percentage = last_percent;
				}
				std::cerr << electric_fence2;
				// END I think optimizer messes something here
			}
			// if we don't print last_percent here vs2017 produces incorrect code with /O2 
			//std::cerr << "m_Percentage is now " << (double)m_Percentage << " vs last_percent = " <<last_percent << std::endl;
			//_ReadWriteBarrier();
			if (last_status == TASKSTATCODE_NOTSTARTED)
			{
				if (timer.Elapse() > s_StartDeadlineInseconds)
				{
					m_LastStatus = STATUS_FAILED;
					break;
				}
			}
			else if (last_status == TASKSTATCODE_FINISHED)
			{
				m_LastStatus = STATUS_COMPLETED;
				m_LastError = FAILCODE_SUCCESS;
				break;
			}
			else if (last_status == TASKSTATCODE_ERROR)
			{
				m_LastStatus = STATUS_FAILED;
				break;
			}
			else
			{
				// invalid status code
				m_LastStatus = STATUS_FAILED;
				break;
			}

			Sleep(1);

			if (m_QuitRequest)
			{
				m_LastStatus = STATUS_CANCEL;
				break;
			}
		} while (true);
	}
	catch (...)
	{
		m_LastStatus = STATUS_FAILED;
	}
}
#pragma optimize( "", on )
~stratum~

stratum
KVRAF
1839 posts since 29 May, 2012

Re: Optimizer bug or threading bug?

Post Fri May 18, 2018 2:10 pm

Using a Nullable class template for "double last_percent" as "Nullable<double> last_percent" causes the result to be 100 instead of 0, which makes me think that these if statement comparisons are evaluated incorrectly:

Code: Select all

last_percent = prog_info.percentComplete;
if (last_percent.value() < 0)
                  last_percent = 0;
else if (last_percent.value() > 100)
                  last_percent = 100;
m_Percentage = last_percent.value();
Because the result is either 0 or 100, but never anything in between with /O2.
~stratum~

stratum
KVRAF
1839 posts since 29 May, 2012

Re: Optimizer bug or threading bug?

Post Fri May 18, 2018 2:24 pm

And if we change the declaration to
Nullable<double> *last_percent = new Nullable<double>();
, causing the actual value of last_percent to be on heap, then the behavior is the same. (If there was a stack corruption around last_percent this would cause the process to crash but it does not. (I guess I know what you are thinking about this hacking style but unfortunately this kind of trial and error is the best I can do on an executable which lacks debug symbols - I don't know any address sanitizer or debugger that works on an optimized executable without debug symbols).)
~stratum~

stratum
KVRAF
1839 posts since 29 May, 2012

Re: Optimizer bug or threading bug?

Post Fri May 18, 2018 2:43 pm

Correction: The idea that Nullable<double> results 100 instead of 0 was wrong, as temporarily modified code incorrectly was:

Code: Select all

last_percent = prog_info.percentComplete;
if (last_percent.value() < 0)
                  last_percent = 100; // *** wrong ***
else if (last_percent.value() > 100)
                  last_percent = 100;
m_Percentage = last_percent.value();
Anyway. I'm not sure this has any significance as there is no stack corruption here and the value of last_percent is wrong unless it is printed or optimizations are disabled.
~stratum~

Return to “DSP and Plug-in Development”