__libm_error_support on the nested domain – in #9: CCLM

in #9: CCLM

<p> Dear colleagues, I’m faced with a problem by nested <span class="caps"> COSMO </span> - <span class="caps"> CLM </span> run (ver. 5.clm14), namely: <br/> “/mnt/scratch/users/juliayarinich_1021/binaries/cosmoclm_v5_clm14new: symbol lookup error: /mnt/scratch/users/juliayarinich_1021/binaries/cosmoclm_v5_clm14new: undefined symbol: __libm_error_support“ <br/> The error occurred by nested <span class="caps"> COSMO </span> - <span class="caps"> CLM </span> run over the Kara Sea region with 0.03 grid. I have supposed and found that this error could be caused by a link to libm libraries. I have checked all files associated with this libm library, recompiled a binary, but received the same result. At the same time, the similar run with the same binary, but for the base domain run covered the western part of Russian Arctic with 0.108 grid, have had no problems. It has quite confused me, that the same binary has different response on the library links depending on experiment type – nesting or not. There are also “Conflicting <span class="caps"> CPU </span> frequencies detected” in case of nested domain (and aren’t in case of base domain). Therefore, I would like to ask you for any hints or experience in such cases. Did you face with such problems before? Thanks a lot for any suggestions! <br/> I have attached the out slurm-. file, <span class="caps"> INPUT </span> and YU* files. </p>

  @vladimirplatonov in #17c0fe1

<p> Dear colleagues, I’m faced with a problem by nested <span class="caps"> COSMO </span> - <span class="caps"> CLM </span> run (ver. 5.clm14), namely: <br/> “/mnt/scratch/users/juliayarinich_1021/binaries/cosmoclm_v5_clm14new: symbol lookup error: /mnt/scratch/users/juliayarinich_1021/binaries/cosmoclm_v5_clm14new: undefined symbol: __libm_error_support“ <br/> The error occurred by nested <span class="caps"> COSMO </span> - <span class="caps"> CLM </span> run over the Kara Sea region with 0.03 grid. I have supposed and found that this error could be caused by a link to libm libraries. I have checked all files associated with this libm library, recompiled a binary, but received the same result. At the same time, the similar run with the same binary, but for the base domain run covered the western part of Russian Arctic with 0.108 grid, have had no problems. It has quite confused me, that the same binary has different response on the library links depending on experiment type – nesting or not. There are also “Conflicting <span class="caps"> CPU </span> frequencies detected” in case of nested domain (and aren’t in case of base domain). Therefore, I would like to ask you for any hints or experience in such cases. Did you face with such problems before? Thanks a lot for any suggestions! <br/> I have attached the out slurm-. file, <span class="caps"> INPUT </span> and YU* files. </p>

__libm_error_support on the nested domain

Dear colleagues, I’m faced with a problem by nested COSMO - CLM run (ver. 5.clm14), namely:
“/mnt/scratch/users/juliayarinich_1021/binaries/cosmoclm_v5_clm14new: symbol lookup error: /mnt/scratch/users/juliayarinich_1021/binaries/cosmoclm_v5_clm14new: undefined symbol: __libm_error_support“
The error occurred by nested COSMO - CLM run over the Kara Sea region with 0.03 grid. I have supposed and found that this error could be caused by a link to libm libraries. I have checked all files associated with this libm library, recompiled a binary, but received the same result. At the same time, the similar run with the same binary, but for the base domain run covered the western part of Russian Arctic with 0.108 grid, have had no problems. It has quite confused me, that the same binary has different response on the library links depending on experiment type – nesting or not. There are also “Conflicting CPU frequencies detected” in case of nested domain (and aren’t in case of base domain). Therefore, I would like to ask you for any hints or experience in such cases. Did you face with such problems before? Thanks a lot for any suggestions!
I have attached the out slurm-. file, INPUT and YU* files.

View in channel
<p> Dear Vladimir, </p> <p> I am not sure whether I can really help, since I never saw a message like “Conflicting <span class="caps"> CPU </span> frequencies detected”. </p> <p> However, one thing is obvious, I believe. The numerical time step (dt=40 sec) you have chosen <br/> for your simulation with a 0.025 deg horizontal resolution <br/> is much too large. <br/> It should be in the order of 20 sec, not larger, perhaps even lower, depending on whether you will get CFL violations or not. <br/> My personal experience: for CPM runs with 0.0275 deg resolution (about 3 km)using also CCLM5-0-14, dt=25sec was the “upper” limit. <br/> dt &gt; 25 sec led to a rather large number of <span class="caps"> CFL </span> violations. </p> <p> Hans-Juergen </p>

  @hans-jürgenpanitz in #49baf90

<p> Dear Vladimir, </p> <p> I am not sure whether I can really help, since I never saw a message like “Conflicting <span class="caps"> CPU </span> frequencies detected”. </p> <p> However, one thing is obvious, I believe. The numerical time step (dt=40 sec) you have chosen <br/> for your simulation with a 0.025 deg horizontal resolution <br/> is much too large. <br/> It should be in the order of 20 sec, not larger, perhaps even lower, depending on whether you will get CFL violations or not. <br/> My personal experience: for CPM runs with 0.0275 deg resolution (about 3 km)using also CCLM5-0-14, dt=25sec was the “upper” limit. <br/> dt &gt; 25 sec led to a rather large number of <span class="caps"> CFL </span> violations. </p> <p> Hans-Juergen </p>

Dear Vladimir,

I am not sure whether I can really help, since I never saw a message like “Conflicting CPU frequencies detected”.

However, one thing is obvious, I believe. The numerical time step (dt=40 sec) you have chosen
for your simulation with a 0.025 deg horizontal resolution
is much too large.
It should be in the order of 20 sec, not larger, perhaps even lower, depending on whether you will get CFL violations or not.
My personal experience: for CPM runs with 0.0275 deg resolution (about 3 km)using also CCLM5-0-14, dt=25sec was the “upper” limit.
dt > 25 sec led to a rather large number of CFL violations.

Hans-Juergen

<p> Dear Hans-Juergen, thanks a lot for your hint anyway! I have changed dt to 20 seconds according to your recommendations, however there are no changes in results, model crashed with the same errors. However, the “Conflicting <span class="caps"> CPU </span> frequencies detected” message is not the only cause for crash, because I have run an other job for base domain, which get the “Conflicting <span class="caps"> CPU </span> frequencies detected” message, but didn’t crash. The “__libm_error_support” appears by the nested run only. </p>

  @vladimirplatonov in #4e5ff32

<p> Dear Hans-Juergen, thanks a lot for your hint anyway! I have changed dt to 20 seconds according to your recommendations, however there are no changes in results, model crashed with the same errors. However, the “Conflicting <span class="caps"> CPU </span> frequencies detected” message is not the only cause for crash, because I have run an other job for base domain, which get the “Conflicting <span class="caps"> CPU </span> frequencies detected” message, but didn’t crash. The “__libm_error_support” appears by the nested run only. </p>

Dear Hans-Juergen, thanks a lot for your hint anyway! I have changed dt to 20 seconds according to your recommendations, however there are no changes in results, model crashed with the same errors. However, the “Conflicting CPU frequencies detected” message is not the only cause for crash, because I have run an other job for base domain, which get the “Conflicting CPU frequencies detected” message, but didn’t crash. The “__libm_error_support” appears by the nested run only.