NEWinogradConvolutionLayer - accuracy issue with FP16 #1154

alvoron · 2025-01-09T16:31:41Z

NEWinogradConvolutionLayer returns different output tensors that may include nan and inf if FP16 is used. The issue is not reproduced with FP32.

The issue is reproduced with ACL v24.12 on Apple M2 and M3.

Initially the issue has been reproduced on oneDNN with ACL integration.
If winograd is disabled in oneDNN, and acl gemm is selected then there is no accuracy loss.

How ACL was built

scons neon=1 opencl=0 openmp=0 cppthreads=1 arch=armv8.6-a Werror=false validation_tests=1 --jobs=8 os=macos build=native --silent fixed_format_kernels=1 asserts=1 debug=1

Reproducer

#include "arm_compute/core/Error.h"
#include "arm_compute/core/TensorShape.h"
#include "arm_compute/runtime/Tensor.h"
#include "arm_compute/runtime/NEON/NEFunctions.h"
#include "tests/Utils.h"
#include "tests/AssetsLibrary.h"
#include "tests/NEON/Accessor.h"
#include <iostream>
#include <vector>

using namespace arm_compute;
using namespace arm_compute::test;

int main(int argc, char *argv[]) {
  TensorInfo srcTensorInfo = TensorInfo(TensorShape(8, 4, 4), 1, DataType::F16, DataLayout::NHWC);
  TensorInfo weiTensorInfo = TensorInfo(TensorShape(8, 3, 3, 16), 1, DataType::F16, DataLayout::NHWC);
  TensorInfo dstTensorInfo = TensorInfo(TensorShape(16, 4, 4), 1, DataType::F16, DataLayout::NHWC);

  PadStrideInfo strideInfo = PadStrideInfo(1, 1, 1, 1, 1, 1, DimensionRoundingType::FLOOR);
  ActivationLayerInfo activationInfo = ActivationLayerInfo();

  auto status = NEWinogradConvolutionLayer::validate(&srcTensorInfo, &weiTensorInfo, nullptr, &dstTensorInfo, strideInfo, activationInfo, true);
  if(status.error_code() != ErrorCode::OK) {
    std::cout << "ERROR: " << status.error_description().c_str() << std::endl;
    exit(1);
  }
  std::cout << "PASSED VALIDATION" << std::endl;

  Tensor srcTensor;
  Tensor weiTensor;
  Tensor dstTensor;
  srcTensor.allocator()->init(srcTensorInfo);
  weiTensor.allocator()->init(weiTensorInfo);
  dstTensor.allocator()->init(dstTensorInfo);

  NEWinogradConvolutionLayer wino;
  wino.configure(&srcTensor, &weiTensor, nullptr, &dstTensor, strideInfo, activationInfo, true);
  std::cout << "PASSED CONFIGURATION" << std::endl;

  srcTensor.allocator()->allocate();
  weiTensor.allocator()->allocate();
  dstTensor.allocator()->allocate();

  AssetsLibrary library(".", std::random_device()());
  std::uniform_real_distribution<> srcDist{ 0.0f, 2000.0f };
  library.fill(Accessor(srcTensor), srcDist, 0);
  std::uniform_real_distribution<> weiDist{ -0.1f, 0.1f };
  library.fill(Accessor(weiTensor), weiDist, 0);
  std::cout << "SRC TENSOR" << std::endl;
  srcTensor.print(std::cout);
  std::cout << "WEI TENSOR" << std::endl;
  weiTensor.print(std::cout);

  wino.run();
  std::cout << "PASSED RUN" << std::endl;
  dstTensor.print(std::cout);

  srcTensor.allocator()->free();
  weiTensor.allocator()->free();
  dstTensor.allocator()->free();

  return 0;
}

Output tensor examples:

-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 

-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 

-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 

-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf

nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 

nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 

nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 

nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan

The text was updated successfully, but these errors were encountered:

alvoron · 2025-01-15T13:21:35Z

If I'm using NEGEMM, then f16 NHWC convolution works fine:

#include "arm_compute/core/Error.h"
#include "arm_compute/core/TensorShape.h"
#include "arm_compute/core/utils/misc/MMappedFile.h"
#include "arm_compute/runtime/Tensor.h"
#include "arm_compute/runtime/NEON/NEFunctions.h"
#include "tests/Utils.h"
#include "tests/AssetsLibrary.h"
#include "tests/NEON/Accessor.h"
#include <iostream>
#include <vector>

using namespace arm_compute;
using namespace arm_compute::test;

int main(int argc, char *argv[]) {
  GEMMInfo gemmInfo;
  TensorInfo srcTensorInfo = TensorInfo(TensorShape(8, 1, 1, 4), 1, DataType::F16, DataLayout::NHWC);
  TensorInfo weiTensorInfo = TensorInfo(TensorShape(2, 8, 4), 1, DataType::F16, DataLayout::NHWC);
  TensorInfo dstTensorInfo = TensorInfo(TensorShape(2, 1, 1, 4), 1, DataType::F16, DataLayout::NHWC);

  auto status = NEGEMM::validate(&srcTensorInfo, &weiTensorInfo, nullptr, &dstTensorInfo, 1.0f, 0.0f, gemmInfo);
  if(status.error_code() != ErrorCode::OK) {
    std::cout << "ERROR: " << status.error_description().c_str() << std::endl;
    exit(1);
  }
  std::cout << "PASSED VALIDATION" << std::endl;

  Tensor srcTensor;
  Tensor weiTensor;
  Tensor dstTensor;
  srcTensor.allocator()->init(srcTensorInfo);
  weiTensor.allocator()->init(weiTensorInfo);
  dstTensor.allocator()->init(dstTensorInfo);

  NEGEMM gemm;
  gemm.configure(&srcTensor, &weiTensor, nullptr, &dstTensor, 1.0f, 0.0f, gemmInfo);
  std::cout << "PASSED CONFIGURATION" << std::endl;

  srcTensor.allocator()->allocate();
  weiTensor.allocator()->allocate();
  dstTensor.allocator()->allocate();

  AssetsLibrary library(".", std::random_device()());
  std::uniform_real_distribution<> srcDist{ 0.0f, 2000.0f };
  library.fill(Accessor(srcTensor), srcDist, 0);
  std::uniform_real_distribution<> weiDist{ -0.1f, 0.1f };
  library.fill(Accessor(weiTensor), weiDist, 0);
  std::cout << "SRC TENSOR" << std::endl;
  srcTensor.print(std::cout);
  std::cout << "WEI TENSOR" << std::endl;
  weiTensor.print(std::cout);

  gemm.run();
  std::cout << "PASSED RUN" << std::endl;
  dstTensor.print(std::cout);

  srcTensor.allocator()->free();
  weiTensor.allocator()->free();
  dstTensor.allocator()->free();

  return 0;
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NEWinogradConvolutionLayer - accuracy issue with FP16 #1154

NEWinogradConvolutionLayer - accuracy issue with FP16 #1154

alvoron commented Jan 9, 2025 •

edited

Loading

alvoron commented Jan 15, 2025

NEWinogradConvolutionLayer - accuracy issue with FP16 #1154

NEWinogradConvolutionLayer - accuracy issue with FP16 #1154

Comments

alvoron commented Jan 9, 2025 • edited Loading

alvoron commented Jan 15, 2025

alvoron commented Jan 9, 2025 •

edited

Loading