Push notifications in your browser are not yet configured.
You are not logged in, you may not see all content and functionalities. If you have an account, please login .
CCLM simulations fail on Mistral - floating point exception C
Dear colleagues,
I have been trying to run a 1-day, test simulation with CCLM cosmo4.8_clm19 on Mistral for the first time since Blizzard was retired.
I am running the CCLM with an almost standard configuration for the 0.0625° horizontal resolution that I have successfully employed in several experiments on Blizzard.
I have modified the batch script as suggested here http://redc.clm-community.eu/projects/cclmdkrz/wiki/Run-scripts.
After a series of minor problems that were solved thanks to the error messages included in the .out and .err files I came to a dead-end.
Now when I submit my job with sbatch the simulation runs for some seconds, produces the lffd1996100400c.nc file and exits without leaving any error message in the .out file. However, in the .err file I get multiple errors in this form:
56: [m10393:41443:0] Caught signal 8 (Floating point exception)
56: backtrace
56: 2 0×00000000000548cc mxm_handle_error() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u4-x86-64- MOFED - CHECKER /hpcx_root/src/hpcx-v1.2.0-268-gcc- OFED -3.12-redhat6.4/mxm-master/src/mxm/util/debug/debug.c:641
56: 3 0×0000000000054a3c mxm_error_signal_handler() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u4-x86-64- MOFED - CHECKER /hpcx_root/src/hpcx-v1.2.0-268-gcc- OFED -3.12-redhat6.4/mxm-master/src/mxm/util/debug/debug.c:616
56: 4 0×00000000000326a0 killpg() ??:0
56: 5 0×00000000002dabd5 pow.L() ??:0
56: 6 0×000000000001ed5d __libc_start_main() ??:0
srun: error: m10393: tasks 40,45-50,52-59: Floating point exception
srun: Terminating job step 2108157.0
00: slurmstepd: *** STEP 2108157.0 ON m10314 CANCELLED AT 2016-03-15T20:09:55 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
The model sources have been complied correctly and have been successfully used by another member of our DKRZ account. It seems that there is a floating point exception that I cannot figure out in any way.
Does any of you ever encountered such problems, or have a clue at what might be causing all this?
I have attached my batch script, my .err and .out files along with the YUSPECIF , the YUDEBUG and the YUCHKDAT .
Your help would be incredibly appreciated.
Best,
Edoardo Mazza
Dear Edoardo,
having a very first and quick look into your YUCHKDAT I would say that something is wrong with your forcing data.
Look at your T_SO values in the deeper layers.
They become very small and even negative!!!! The unit for T_SO is Kelvin!!
Furthermore I saw in your YUSPECIF that you run the model in NWP mode, not in climate mode (lbdclim=.FALSE.).
Is this what you want to do?
Hans-Juergen
Dear Hans-Juergen,
Thank you very much for your support and sorry for the late reply but it took me a few days to go back to the roots of the problem.
I agree that there’s something wrong with those temperature, therefore I went back to the previous downscaling step to see where these weird values came from.
I wanted to repeat the simulation driven with ERA -Interim obtained from the DKRZ directory /pool/data/CCLM/reanalyses/ERAInterim. I adapted the run_int2lm script for the gcm2cclm case. Again, I wanted to test that it was working fine for 24 hours.
Unfortunately the situation does not seem to have changed at all. The “floating point exception” error is still causing the program to quit. So it seems that the problem goes beyond the T_SO values. I am really losing the focus on what the problem is right now. I have checked and double-checked but clearly there’s something wrong that I can’t find.
Please find attached the run_int2lm, the .out, YUCHKDAT , INPUT , OUTPUT and YUDEBUG files.
Best wishes,
Edoardo
Please try
Dear Edoardo,
did you realize that the ERA -Interim data (caf-files) in /pool/data/CCLM/reanalyses are Netcdf4 compressed?
There is a README in the ERAINT directory telling that.
Perhaps that is your problem.
Alternatively you can try to use the umcompressed caf-files that are available from my workspace:
/work/bb0849/b364034/ERAINT/CCLM_Forcing_Data/
Furthermore, the ERAINT data have T_SKIN, thus set “luse_t_skin=.TRUE.“
Of course, consider also Burkhardt’s suggestion “lprog_qi=.TRUE.”, since QI is also available
Hans-Juergen
Dear all,
I currently encountered a very similar problem. I get the following messages:
I already tried to decompress the ERA -Interim data and I also considered the hints you gave before, but it still doesn’t work.
I would be very grateful for help.
Thank you very much and best regards,
Eva
Dear all,
by changing the INT2LM I could solve the problem I posted before, but now a new error appears, that is also kind of similar.
Could anyone please help me with this problem? I would be very grateful for help.
Thank you very much and best regards,
Eva Nowatzki