Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEWinogradConvolutionLayer - accuracy issue with FP16 #1154

Open
alvoron opened this issue Jan 9, 2025 · 1 comment
Open

NEWinogradConvolutionLayer - accuracy issue with FP16 #1154

alvoron opened this issue Jan 9, 2025 · 1 comment

Comments

@alvoron
Copy link

alvoron commented Jan 9, 2025

NEWinogradConvolutionLayer returns different output tensors that may include nan and inf if FP16 is used. The issue is not reproduced with FP32.

The issue is reproduced with ACL v24.12 on Apple M2 and M3.

Initially the issue has been reproduced on oneDNN with ACL integration.
If winograd is disabled in oneDNN, and acl gemm is selected then there is no accuracy loss.

How ACL was built

scons neon=1 opencl=0 openmp=0 cppthreads=1 arch=armv8.6-a Werror=false validation_tests=1 --jobs=8 os=macos build=native --silent fixed_format_kernels=1 asserts=1 debug=1

Reproducer

#include "arm_compute/core/Error.h"
#include "arm_compute/core/TensorShape.h"
#include "arm_compute/runtime/Tensor.h"
#include "arm_compute/runtime/NEON/NEFunctions.h"
#include "tests/Utils.h"
#include "tests/AssetsLibrary.h"
#include "tests/NEON/Accessor.h"
#include <iostream>
#include <vector>

using namespace arm_compute;
using namespace arm_compute::test;

int main(int argc, char *argv[]) {
  TensorInfo srcTensorInfo = TensorInfo(TensorShape(8, 4, 4), 1, DataType::F16, DataLayout::NHWC);
  TensorInfo weiTensorInfo = TensorInfo(TensorShape(8, 3, 3, 16), 1, DataType::F16, DataLayout::NHWC);
  TensorInfo dstTensorInfo = TensorInfo(TensorShape(16, 4, 4), 1, DataType::F16, DataLayout::NHWC);

  PadStrideInfo strideInfo = PadStrideInfo(1, 1, 1, 1, 1, 1, DimensionRoundingType::FLOOR);
  ActivationLayerInfo activationInfo = ActivationLayerInfo();

  auto status = NEWinogradConvolutionLayer::validate(&srcTensorInfo, &weiTensorInfo, nullptr, &dstTensorInfo, strideInfo, activationInfo, true);
  if(status.error_code() != ErrorCode::OK) {
    std::cout << "ERROR: " << status.error_description().c_str() << std::endl;
    exit(1);
  }
  std::cout << "PASSED VALIDATION" << std::endl;

  Tensor srcTensor;
  Tensor weiTensor;
  Tensor dstTensor;
  srcTensor.allocator()->init(srcTensorInfo);
  weiTensor.allocator()->init(weiTensorInfo);
  dstTensor.allocator()->init(dstTensorInfo);

  NEWinogradConvolutionLayer wino;
  wino.configure(&srcTensor, &weiTensor, nullptr, &dstTensor, strideInfo, activationInfo, true);
  std::cout << "PASSED CONFIGURATION" << std::endl;

  srcTensor.allocator()->allocate();
  weiTensor.allocator()->allocate();
  dstTensor.allocator()->allocate();

  AssetsLibrary library(".", std::random_device()());
  std::uniform_real_distribution<> srcDist{ 0.0f, 2000.0f };
  library.fill(Accessor(srcTensor), srcDist, 0);
  std::uniform_real_distribution<> weiDist{ -0.1f, 0.1f };
  library.fill(Accessor(weiTensor), weiDist, 0);
  std::cout << "SRC TENSOR" << std::endl;
  srcTensor.print(std::cout);
  std::cout << "WEI TENSOR" << std::endl;
  weiTensor.print(std::cout);

  wino.run();
  std::cout << "PASSED RUN" << std::endl;
  dstTensor.print(std::cout);

  srcTensor.allocator()->free();
  weiTensor.allocator()->free();
  dstTensor.allocator()->free();

  return 0;
}

Output tensor examples:

-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 

-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 

-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 

-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf 
-inf -inf  inf -inf  inf -inf -inf -inf  inf  inf -inf -inf  inf  inf  inf -inf
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 

nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 

nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 

nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
@alvoron
Copy link
Author

alvoron commented Jan 15, 2025

If I'm using NEGEMM, then f16 NHWC convolution works fine:

#include "arm_compute/core/Error.h"
#include "arm_compute/core/TensorShape.h"
#include "arm_compute/core/utils/misc/MMappedFile.h"
#include "arm_compute/runtime/Tensor.h"
#include "arm_compute/runtime/NEON/NEFunctions.h"
#include "tests/Utils.h"
#include "tests/AssetsLibrary.h"
#include "tests/NEON/Accessor.h"
#include <iostream>
#include <vector>

using namespace arm_compute;
using namespace arm_compute::test;

int main(int argc, char *argv[]) {
  GEMMInfo gemmInfo;
  TensorInfo srcTensorInfo = TensorInfo(TensorShape(8, 1, 1, 4), 1, DataType::F16, DataLayout::NHWC);
  TensorInfo weiTensorInfo = TensorInfo(TensorShape(2, 8, 4), 1, DataType::F16, DataLayout::NHWC);
  TensorInfo dstTensorInfo = TensorInfo(TensorShape(2, 1, 1, 4), 1, DataType::F16, DataLayout::NHWC);

  auto status = NEGEMM::validate(&srcTensorInfo, &weiTensorInfo, nullptr, &dstTensorInfo, 1.0f, 0.0f, gemmInfo);
  if(status.error_code() != ErrorCode::OK) {
    std::cout << "ERROR: " << status.error_description().c_str() << std::endl;
    exit(1);
  }
  std::cout << "PASSED VALIDATION" << std::endl;

  Tensor srcTensor;
  Tensor weiTensor;
  Tensor dstTensor;
  srcTensor.allocator()->init(srcTensorInfo);
  weiTensor.allocator()->init(weiTensorInfo);
  dstTensor.allocator()->init(dstTensorInfo);

  NEGEMM gemm;
  gemm.configure(&srcTensor, &weiTensor, nullptr, &dstTensor, 1.0f, 0.0f, gemmInfo);
  std::cout << "PASSED CONFIGURATION" << std::endl;

  srcTensor.allocator()->allocate();
  weiTensor.allocator()->allocate();
  dstTensor.allocator()->allocate();

  AssetsLibrary library(".", std::random_device()());
  std::uniform_real_distribution<> srcDist{ 0.0f, 2000.0f };
  library.fill(Accessor(srcTensor), srcDist, 0);
  std::uniform_real_distribution<> weiDist{ -0.1f, 0.1f };
  library.fill(Accessor(weiTensor), weiDist, 0);
  std::cout << "SRC TENSOR" << std::endl;
  srcTensor.print(std::cout);
  std::cout << "WEI TENSOR" << std::endl;
  weiTensor.print(std::cout);

  gemm.run();
  std::cout << "PASSED RUN" << std::endl;
  dstTensor.print(std::cout);

  srcTensor.allocator()->free();
  weiTensor.allocator()->free();
  dstTensor.allocator()->free();

  return 0;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant