Our site saves small pieces of text information (cookies) on your
device in order to verify your login. These cookies are essential
to provide access to resources on this website and it will not
work properly without.
Learn more
<p>
Dear colleagues,
</p>
<p>
We have the problem in cclm. Program is suddenly terminated for some unclear reasons.
<br/>
int2lm script works correctly and all the output files appear.
<br/>
We tried to install both starter package and normal version of int2lm and cclm but the problem is the same.
<br/>
tests from starter package work fine.
<br/>
We also tried to play with parameters in
<span class="caps">
GRIBIN
</span>
section but it didn’t help.
<br/>
I’ve attached all of the files which, as we think, can be usefull for understanding problem.
<br/>
Could you help us please?
</p>
<p>
Kind regards,
<br/>
Iya Belova
</p>
<p>
P.S. due to old version of fortran compiler on our cluster we had to change lines from
<strong>
<span class="caps">
READ
</span>
(nuin, inictl,
<span class="caps">
IOSTAT
</span>
=iz_err,
<span class="caps">
IOMSG
</span>
=iomsg_str)
</strong>
to
<strong>
<span class="caps">
READ
</span>
(nuin, inictl,
<span class="caps">
IOSTAT
</span>
=iz_err)
</strong>
in some of the cclm install files
</p>
<p>
Dear colleagues,
</p>
<p>
We have the problem in cclm. Program is suddenly terminated for some unclear reasons.
<br/>
int2lm script works correctly and all the output files appear.
<br/>
We tried to install both starter package and normal version of int2lm and cclm but the problem is the same.
<br/>
tests from starter package work fine.
<br/>
We also tried to play with parameters in
<span class="caps">
GRIBIN
</span>
section but it didn’t help.
<br/>
I’ve attached all of the files which, as we think, can be usefull for understanding problem.
<br/>
Could you help us please?
</p>
<p>
Kind regards,
<br/>
Iya Belova
</p>
<p>
P.S. due to old version of fortran compiler on our cluster we had to change lines from
<strong>
<span class="caps">
READ
</span>
(nuin, inictl,
<span class="caps">
IOSTAT
</span>
=iz_err,
<span class="caps">
IOMSG
</span>
=iomsg_str)
</strong>
to
<strong>
<span class="caps">
READ
</span>
(nuin, inictl,
<span class="caps">
IOSTAT
</span>
=iz_err)
</strong>
in some of the cclm install files
</p>
We have the problem in cclm. Program is suddenly terminated for some unclear reasons.
int2lm script works correctly and all the output files appear.
We tried to install both starter package and normal version of int2lm and cclm but the problem is the same.
tests from starter package work fine.
We also tried to play with parameters in
GRIBIN
section but it didn’t help.
I’ve attached all of the files which, as we think, can be usefull for understanding problem.
Could you help us please?
Kind regards,
Iya Belova
P.S. due to old version of fortran compiler on our cluster we had to change lines from
READ
(nuin, inictl,
IOSTAT
=iz_err,
IOMSG
=iomsg_str)
to
READ
(nuin, inictl,
IOSTAT
=iz_err)
in some of the cclm install files
<p>
Dear Iya
</p>
<p>
actually, I have no explanation for the error.
<br/>
At a frist glance, it seems to be a system (
<span class="caps">
MPI
</span>
?) error.
<br/>
But who knows.
<br/>
Nevertheless, here are a few comments on your setup:
</p>
<p>
1. the timestep dt: you are using dt=120 (sec) together with a spatial resolution of about 12 km.
<br/>
dt=120 is, to my opinion, much too high. There is a large danger for violations of the
<span class="caps">
CFL
</span>
-criterion.
<br/>
I would use dt=75
</p>
<p>
2. if I understand your setup correctly, you want to perform a 30 day simulation, starting 2009120100 and ending 2009123100.
<br/>
This is a simulation duration of 720 hours (30 days * 24 hours/day) which should be the value for the namelist parameter “hstop”.
<br/>
But you are using “hstop=30*720” (see your cclm-setup)
<br/>
This could be corrected in your setup file by defining
<br/>
NHOURS=24
<br/>
instead of
<br/>
NHOURS=720
</p>
<p>
3. the triple of values for namelist parameter “nhour_restart” should be
<br/>
nhour_restart=120,$HSTOP,120
<br/>
and not
<br/>
nhour_restart=0,$HSTOP,120
<br/>
where the values are given in hours.
<br/>
However, this mistake (the first value of the triple) is corrected by
<span class="caps">
CCLM
</span>
(see cclm.exe.out)
</p>
<p>
4. Can someone else comment on Iya’s choices of Tuning parameters (see cclm.exe.out and
<span class="caps">
YUSPECIF
</span>
). They seem to be rather “extreme”.
</p>
<p>
Best regards
<br/>
Hans-Juergen
</p>
<p>
Dear Iya
</p>
<p>
actually, I have no explanation for the error.
<br/>
At a frist glance, it seems to be a system (
<span class="caps">
MPI
</span>
?) error.
<br/>
But who knows.
<br/>
Nevertheless, here are a few comments on your setup:
</p>
<p>
1. the timestep dt: you are using dt=120 (sec) together with a spatial resolution of about 12 km.
<br/>
dt=120 is, to my opinion, much too high. There is a large danger for violations of the
<span class="caps">
CFL
</span>
-criterion.
<br/>
I would use dt=75
</p>
<p>
2. if I understand your setup correctly, you want to perform a 30 day simulation, starting 2009120100 and ending 2009123100.
<br/>
This is a simulation duration of 720 hours (30 days * 24 hours/day) which should be the value for the namelist parameter “hstop”.
<br/>
But you are using “hstop=30*720” (see your cclm-setup)
<br/>
This could be corrected in your setup file by defining
<br/>
NHOURS=24
<br/>
instead of
<br/>
NHOURS=720
</p>
<p>
3. the triple of values for namelist parameter “nhour_restart” should be
<br/>
nhour_restart=120,$HSTOP,120
<br/>
and not
<br/>
nhour_restart=0,$HSTOP,120
<br/>
where the values are given in hours.
<br/>
However, this mistake (the first value of the triple) is corrected by
<span class="caps">
CCLM
</span>
(see cclm.exe.out)
</p>
<p>
4. Can someone else comment on Iya’s choices of Tuning parameters (see cclm.exe.out and
<span class="caps">
YUSPECIF
</span>
). They seem to be rather “extreme”.
</p>
<p>
Best regards
<br/>
Hans-Juergen
</p>
actually, I have no explanation for the error.
At a frist glance, it seems to be a system (
MPI
?) error.
But who knows.
Nevertheless, here are a few comments on your setup:
1. the timestep dt: you are using dt=120 (sec) together with a spatial resolution of about 12 km.
dt=120 is, to my opinion, much too high. There is a large danger for violations of the
CFL
-criterion.
I would use dt=75
2. if I understand your setup correctly, you want to perform a 30 day simulation, starting 2009120100 and ending 2009123100.
This is a simulation duration of 720 hours (30 days * 24 hours/day) which should be the value for the namelist parameter “hstop”.
But you are using “hstop=30*720” (see your cclm-setup)
This could be corrected in your setup file by defining
NHOURS=24
instead of
NHOURS=720
3. the triple of values for namelist parameter “nhour_restart” should be
nhour_restart=120,$HSTOP,120
and not
nhour_restart=0,$HSTOP,120
where the values are given in hours.
However, this mistake (the first value of the triple) is corrected by
CCLM
(see cclm.exe.out)
4. Can someone else comment on Iya’s choices of Tuning parameters (see cclm.exe.out and
YUSPECIF
). They seem to be rather “extreme”.
<p>
Thank you for your answers.
<br/>
This problem really looks like an
<span class="caps">
MPI
</span>
error. We had almost the same problem some months ago (you can find this discussion in the Starter Package Support forum thread).
<br/>
We made all the changes as you suggested in your answers but it didn’t help. I’ve attached new .out file just in case but it seems that there are no real changes there.
</p>
<p>
P.S. in sp cclm script we had the following tuning parameters:
<br/>
“ wichfakt=0.,
tur_len=500.,
v0snow=20.,
tkhmin=0.35,
tkmmin=1.,
rlam_heat=0.5249,
mu_rain=0.5,
entr_sc=0.0002,
uc1=0.0626,
fac_rootdp2=0.9000,
soilhyd=1.6200”
we also tried to start without tunung parameters at all but it gave the same result. Maybe there is some soft regime which we could try?
</p>
<p>
P.P.S. there are multiple lines in .out file “src_input: check completeness of input data”. Maybe this can tell something about source of the problem?
</p>
<p>
Thank you for your answers.
<br/>
This problem really looks like an
<span class="caps">
MPI
</span>
error. We had almost the same problem some months ago (you can find this discussion in the Starter Package Support forum thread).
<br/>
We made all the changes as you suggested in your answers but it didn’t help. I’ve attached new .out file just in case but it seems that there are no real changes there.
</p>
<p>
P.S. in sp cclm script we had the following tuning parameters:
<br/>
“ wichfakt=0.,
tur_len=500.,
v0snow=20.,
tkhmin=0.35,
tkmmin=1.,
rlam_heat=0.5249,
mu_rain=0.5,
entr_sc=0.0002,
uc1=0.0626,
fac_rootdp2=0.9000,
soilhyd=1.6200”
we also tried to start without tunung parameters at all but it gave the same result. Maybe there is some soft regime which we could try?
</p>
<p>
P.P.S. there are multiple lines in .out file “src_input: check completeness of input data”. Maybe this can tell something about source of the problem?
</p>
Thank you for your answers.
This problem really looks like an
MPI
error. We had almost the same problem some months ago (you can find this discussion in the Starter Package Support forum thread).
We made all the changes as you suggested in your answers but it didn’t help. I’ve attached new .out file just in case but it seems that there are no real changes there.
P.S. in sp cclm script we had the following tuning parameters:
“ wichfakt=0.,
tur_len=500.,
v0snow=20.,
tkhmin=0.35,
tkmmin=1.,
rlam_heat=0.5249,
mu_rain=0.5,
entr_sc=0.0002,
uc1=0.0626,
fac_rootdp2=0.9000,
soilhyd=1.6200”
we also tried to start without tunung parameters at all but it gave the same result. Maybe there is some soft regime which we could try?
P.P.S. there are multiple lines in .out file “src_input: check completeness of input data”. Maybe this can tell something about source of the problem?
<p>
Since you wrote that the starter package tests work, you may try to make your changes step by step from the starter package settings to your requested settings.
</p>
<p>
The multiple lines in .out file “src_input: check completeness of input data” appear because each processor writes this. This can be suppressed by changing the line
<br/>
<pre>
PRINT *, ' src_input: check completeness of input data'
</pre>
to
<br/>
<pre>
IF (my_cart_id == 0) PRINT *, ' src_input: check completeness of input data'
</pre>
Then only processor 0 writes the output.
</p>
<p>
Since you wrote that the starter package tests work, you may try to make your changes step by step from the starter package settings to your requested settings.
</p>
<p>
The multiple lines in .out file “src_input: check completeness of input data” appear because each processor writes this. This can be suppressed by changing the line
<br/>
<pre>
PRINT *, ' src_input: check completeness of input data'
</pre>
to
<br/>
<pre>
IF (my_cart_id == 0) PRINT *, ' src_input: check completeness of input data'
</pre>
Then only processor 0 writes the output.
</p>
Since you wrote that the starter package tests work, you may try to make your changes step by step from the starter package settings to your requested settings.
The multiple lines in .out file “src_input: check completeness of input data” appear because each processor writes this. This can be suppressed by changing the line
PRINT *, ' src_input: check completeness of input data'
to
IF (my_cart_id == 0) PRINT *, ' src_input: check completeness of input data'
<p>
And a further suggestion in order to find out whether there is really a
<span class="caps">
MPI
</span>
problem on your system.
</p>
<p>
Sicne the error occurs already at the very beginning of your simulation
<br/>
try to run it using only one process: nprocx=1, nprocy=1
</p>
<p>
If the error does not occur anymore, then I would say, it is a
<span class="caps">
MPI
</span>
/system problem
</p>
<p>
Hans-Juergen
</p>
<p>
And a further suggestion in order to find out whether there is really a
<span class="caps">
MPI
</span>
problem on your system.
</p>
<p>
Sicne the error occurs already at the very beginning of your simulation
<br/>
try to run it using only one process: nprocx=1, nprocy=1
</p>
<p>
If the error does not occur anymore, then I would say, it is a
<span class="caps">
MPI
</span>
/system problem
</p>
<p>
Hans-Juergen
</p>
<p>
Dear Hans-Juergen,
</p>
<p>
I’ve mentioned before that we had the problem in starter package. The problem was that we were not able to start program in the uniprocessor mode. When we try to make it script fails earlier during reading ncdf files (both int2lm and cclm).
</p>
<p>
Dear Hans-Juergen,
</p>
<p>
I’ve mentioned before that we had the problem in starter package. The problem was that we were not able to start program in the uniprocessor mode. When we try to make it script fails earlier during reading ncdf files (both int2lm and cclm).
</p>
I’ve mentioned before that we had the problem in starter package. The problem was that we were not able to start program in the uniprocessor mode. When we try to make it script fails earlier during reading ncdf files (both int2lm and cclm).
<p>
Dear colleagues,
</p>
<p>
Thank you for your advices.
<br/>
Problem was that our cluster is too weak for chosen LM grid.
<br/>
We made new grid with the following options and now everything works:
startlat_tot = -7.6, startlon_tot = -7.6,
pollat = 34.3, pollon = -142.5,
dlon=0.152, dlat=0.152,
ie_tot=100, je_tot=100, ke_tot=40,
</p>
<p>
Kind regards,
<br/>
Iya Belova
</p>
<p>
Dear colleagues,
</p>
<p>
Thank you for your advices.
<br/>
Problem was that our cluster is too weak for chosen LM grid.
<br/>
We made new grid with the following options and now everything works:
startlat_tot = -7.6, startlon_tot = -7.6,
pollat = 34.3, pollon = -142.5,
dlon=0.152, dlat=0.152,
ie_tot=100, je_tot=100, ke_tot=40,
</p>
<p>
Kind regards,
<br/>
Iya Belova
</p>
Thank you for your advices.
Problem was that our cluster is too weak for chosen LM grid.
We made new grid with the following options and now everything works:
startlat_tot = -7.6, startlon_tot = -7.6,
pollat = 34.3, pollon = -142.5,
dlon=0.152, dlat=0.152,
ie_tot=100, je_tot=100, ke_tot=40,
strange problem in cclm
Dear colleagues,
We have the problem in cclm. Program is suddenly terminated for some unclear reasons.
int2lm script works correctly and all the output files appear.
We tried to install both starter package and normal version of int2lm and cclm but the problem is the same.
tests from starter package work fine.
We also tried to play with parameters in GRIBIN section but it didn’t help.
I’ve attached all of the files which, as we think, can be usefull for understanding problem.
Could you help us please?
Kind regards,
Iya Belova
P.S. due to old version of fortran compiler on our cluster we had to change lines from READ (nuin, inictl, IOSTAT =iz_err, IOMSG =iomsg_str) to READ (nuin, inictl, IOSTAT =iz_err) in some of the cclm install files
Dear Iya
actually, I have no explanation for the error.
At a frist glance, it seems to be a system ( MPI ?) error.
But who knows.
Nevertheless, here are a few comments on your setup:
1. the timestep dt: you are using dt=120 (sec) together with a spatial resolution of about 12 km.
dt=120 is, to my opinion, much too high. There is a large danger for violations of the CFL -criterion.
I would use dt=75
2. if I understand your setup correctly, you want to perform a 30 day simulation, starting 2009120100 and ending 2009123100.
This is a simulation duration of 720 hours (30 days * 24 hours/day) which should be the value for the namelist parameter “hstop”.
But you are using “hstop=30*720” (see your cclm-setup)
This could be corrected in your setup file by defining
NHOURS=24
instead of
NHOURS=720
3. the triple of values for namelist parameter “nhour_restart” should be
nhour_restart=120,$HSTOP,120
and not
nhour_restart=0,$HSTOP,120
where the values are given in hours.
However, this mistake (the first value of the triple) is corrected by CCLM (see cclm.exe.out)
4. Can someone else comment on Iya’s choices of Tuning parameters (see cclm.exe.out and YUSPECIF ). They seem to be rather “extreme”.
Best regards
Hans-Juergen
Regarding Hans-Jürgens item 4:
Iya, can you use the tuning parameters as in the starter package script and test your job?
Thank you for your answers.
This problem really looks like an MPI error. We had almost the same problem some months ago (you can find this discussion in the Starter Package Support forum thread).
We made all the changes as you suggested in your answers but it didn’t help. I’ve attached new .out file just in case but it seems that there are no real changes there.
P.S. in sp cclm script we had the following tuning parameters:
“ wichfakt=0., tur_len=500., v0snow=20., tkhmin=0.35, tkmmin=1., rlam_heat=0.5249, mu_rain=0.5, entr_sc=0.0002, uc1=0.0626, fac_rootdp2=0.9000, soilhyd=1.6200” we also tried to start without tunung parameters at all but it gave the same result. Maybe there is some soft regime which we could try?
P.P.S. there are multiple lines in .out file “src_input: check completeness of input data”. Maybe this can tell something about source of the problem?
Since you wrote that the starter package tests work, you may try to make your changes step by step from the starter package settings to your requested settings.
The multiple lines in .out file “src_input: check completeness of input data” appear because each processor writes this. This can be suppressed by changing the line
toThen only processor 0 writes the output.
And a further suggestion in order to find out whether there is really a MPI problem on your system.
Sicne the error occurs already at the very beginning of your simulation
try to run it using only one process: nprocx=1, nprocy=1
If the error does not occur anymore, then I would say, it is a MPI /system problem
Hans-Juergen
Dear Hans-Juergen,
I’ve mentioned before that we had the problem in starter package. The problem was that we were not able to start program in the uniprocessor mode. When we try to make it script fails earlier during reading ncdf files (both int2lm and cclm).
Dear colleagues,
Thank you for your advices.
Problem was that our cluster is too weak for chosen LM grid.
We made new grid with the following options and now everything works: startlat_tot = -7.6, startlon_tot = -7.6, pollat = 34.3, pollon = -142.5, dlon=0.152, dlat=0.152, ie_tot=100, je_tot=100, ke_tot=40,
Kind regards,
Iya Belova