Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_3d_laser_acceleration_single_precision_comms test fails on A100 GPU #5528

Closed
liuyuchuncn opened this issue Dec 23, 2024 · 3 comments
Closed
Assignees

Comments

@liuyuchuncn
Copy link

liuyuchuncn commented Dec 23, 2024

laser_acceleration test fialed on A100 GPU, Absolute error 、Relative error Precision Error
double required rto e-9
LOG:
root:/opt/new/WarpX-development/build/Examples/Physics_applications/laser_acceleration# ctest -V
UpdateCTestConfiguration from :/opt/new/WarpX-development/build/Examples/Physics_applications/laser_acceleration/DartConfiguration.tcl
UpdateCTestConfiguration from :/opt/new/WarpX-development/build/Examples/Physics_applications/laser_acceleration/DartConfiguration.tcl
Test project /opt/new/WarpX-development/build/Examples/Physics_applications/laser_acceleration
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 1
Start 1: test_3d_laser_acceleration_single_precision_comms.analysis

1: Test command: /opt/new/WarpX-development/Examples/Physics_applications/laser_acceleration/analysis_default_openpmd_regression.py "diags/diag1/"
1: Working Directory: /opt/new/WarpX-development/build/bin/test_3d_laser_acceleration_single_precision_comms
1: Environment variables:
1: PYTHONPATH=:/opt/new/WarpX-development/build/lib/site-packages:/opt/new/WarpX-development/Regression/Checksum:/opt/new/WarpX-development/Regression/PostProcessingUtils:/opt/new/WarpX-development/Tools/Parser:/opt/new/WarpX-development/Tools/PostProcessing
1: Test timeout computed to be: 10000000
1: ERROR: Benchmark and output file checksum have different value for key [lev=0,Bx]
1: Benchmark: [lev=0,Bx] 5.863879051452597e+06
1: Test file: [lev=0,Bx] 5.863879090030457e+06
1: Absolute error: 3.86e-02
1: Relative error: 6.58e-09
1: ERROR: Benchmark and output file checksum have different value for key [lev=0,By]
1: Benchmark: [lev=0,By] 2.411512400469908e+03
1: Test file: [lev=0,By] 2.411493485037452e+03
1: Absolute error: 1.89e-02
1: Relative error: 7.84e-06
1: ERROR: Benchmark and output file checksum have different value for key [lev=0,Bz]
1: Benchmark: [lev=0,Bz] 1.160254017872695e+05
1: Test file: [lev=0,Bz] 1.160254578211360e+05
1: Absolute error: 5.60e-02
1: Relative error: 4.83e-07
1: ERROR: Benchmark and output file checksum have different value for key [lev=0,Ex]
1: Benchmark: [lev=0,Ex] 6.267725953308365e+12
1: Test file: [lev=0,Ex] 6.267738740870245e+12
1: Absolute error: 1.28e+07
1: Relative error: 2.04e-06
1: ERROR: Benchmark and output file checksum have different value for key [lev=0,Ey]
1: Benchmark: [lev=0,Ey] 1.670763222566622e+15
1: Test file: [lev=0,Ey] 1.670763248552234e+15
1: Absolute error: 2.60e+07
1: Relative error: 1.56e-08
1: ERROR: Benchmark and output file checksum have different value for key [lev=0,Ez]
1: Benchmark: [lev=0,Ez] 1.043459777626998e+14
1: Test file: [lev=0,Ez] 1.043459804997007e+14
1: Absolute error: 2.74e+06
1: Relative error: 2.62e-08
1: ERROR: Benchmark and output file checksum have different value for key [lev=0,jx]
1: Benchmark: [lev=0,jx] 5.556881543180274e+14
1: Test file: [lev=0,jx] 5.556873977401774e+14
1: Absolute error: 7.57e+08
1: Relative error: 1.36e-06
1: ERROR: Benchmark and output file checksum have different value for key [lev=0,jy]
1: Benchmark: [lev=0,jy] 1.595897074125253e+15
1: Test file: [lev=0,jy] 1.595894506396687e+15
1: Absolute error: 2.57e+09
1: Relative error: 1.61e-06
1: ERROR: Benchmark and output file checksum have different value for key [lev=0,jz]
1: Benchmark: [lev=0,jz] 1.045267363178543e+15
1: Test file: [lev=0,jz] 1.045265313586676e+15
1: Absolute error: 2.05e+09
1: Relative error: 1.96e-06
1: ERROR: Benchmark and output file checksum have different value for key [electrons,particle_position_y]
1: Benchmark: [electrons,particle_position_y] 7.150340902640349e-01
1: Test file: [electrons,particle_position_y] 7.150340884827551e-01
1: Absolute error: 1.78e-09
1: Relative error: 2.49e-09
1: ERROR: Benchmark and output file checksum have different value for key [electrons,particle_momentum_x]
1: Benchmark: [electrons,particle_momentum_x] 1.792123238561466e-20
1: Test file: [electrons,particle_momentum_x] 1.792123000377400e-20
1: Absolute error: 2.38e-27
1: Relative error: 1.33e-07
1: ERROR: Benchmark and output file checksum have different value for key [electrons,particle_momentum_y]
1: Benchmark: [electrons,particle_momentum_y] 7.225826072773540e-20
1: Test file: [electrons,particle_momentum_y] 7.225818816883145e-20
1: Absolute error: 7.26e-26
1: Relative error: 1.00e-06
1: ERROR: Benchmark and output file checksum have different value for key [electrons,particle_momentum_z]
1: Benchmark: [electrons,particle_momentum_z] 4.231730764749506e-20
1: Test file: [electrons,particle_momentum_z] 4.231723740822546e-20
1: Absolute error: 7.02e-26
1: Relative error: 1.66e-06
1:
1: New checksums file test_3d_laser_acceleration_single_precision_comms.json:
1: {
1: "lev=0": {
1: "Bx": 5863879.090030457,
1: "By": 2411.4934850374516,
1: "Bz": 116025.45782113599,
1: "Ex": 6267738740870.245,
1: "Ey": 1670763248552233.5,
1: "Ez": 104345980499700.66,
1: "jx": 555687397740177.4,
1: "jy": 1595894506396687.2,
1: "jz": 1045265313586676.1,
1: "rho": 2205684400.5803313
1: },
1: "electrons": {
1: "particle_initialenergy": 0.0,
1: "particle_regionofinterest": 1936.0,
1: "particle_position_x": 0.7139122621056099,
1: "particle_position_y": 0.7150340884827551,
1: "particle_position_z": 1.3175770598493888,
1: "particle_momentum_x": 1.7921230003773995e-20,
1: "particle_momentum_y": 7.225818816883145e-20,
1: "particle_momentum_z": 4.2317237408225457e-20,
1: "particle_weight": 12926557617.187498
1: }
1: }
1/1 Test #1: test_3d_laser_acceleration_single_precision_comms.analysis ...***Failed 1.25 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) = 1.25 sec

The following tests FAILED:
1 - test_3d_laser_acceleration_single_precision_comms.analysis (Failed)
Errors while running CTest

@liuyuchuncn
Copy link
Author

WarpX:24.10 amrex:amrex-24.10 openpmd:openPMD-api-0.16.0

@liuyuchuncn
Copy link
Author

WarpX_PARTICLE_PRECISION DOUBLE

@EZoni
Copy link
Member

EZoni commented Jan 8, 2025

Hi @liuyuchuncn, thanks for opening the issue.

We use checksums to verify that code changes do not introduce regression errors in WarpX.

In the output above I see relatively small relative errors in the checksums values. I think this is nothing to worry about, changes of these magnitudes can occur when running a test on different computer architectures.

@EZoni EZoni changed the title test_3d_laser_acceleration_single_precision_comms test fialed test_3d_laser_acceleration_single_precision_comms test fails on A100 GPU Jan 8, 2025
@EZoni EZoni closed this as completed Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants