Emre Salkım in #9: CCLM on April 17 (#aab3067)

in #9: CCLM

<h2 style=""> Running CCLM on multiple nodes, An ORTE daemon unexpectedly quits </h2> <p style=""> </p> <p style=""> I am trying to run cclm on multiple nodes (to utilize more processors) however i recieve this message in cclm logs. </p> <p style=""> An ORTE daemon has unexpectedly failed after launch and before <br/> communicating back to mpirun. This could be caused by a number <br/> of factors, including an inability to create a connection back <br/> to mpirun due to a lack of common network interfaces and/or no <br/> route found between them. Please check network connectivity <br/> (including firewalls and network routing requirements). </p> <p style=""> in cclm.job.sh around 515th row my script goes like this (should I add syntax or options after mpirun command? </p> <pre><code class="language-plaintext">echo ----- start CCLM mpirun ${CCLM_EXE} echo ----- CCLM finished</code></pre> <p style=""> also, no matter if I update cclm's number of processors in the subchain or chane npx, npy in the job_settings accordingly, i still get the ORTE message. </p> <p style=""> could anyone help? </p> <p style=""> yours </p>

  @emresalkım in #aab3067

<h2 style=""> Running CCLM on multiple nodes, An ORTE daemon unexpectedly quits </h2> <p style=""> </p> <p style=""> I am trying to run cclm on multiple nodes (to utilize more processors) however i recieve this message in cclm logs. </p> <p style=""> An ORTE daemon has unexpectedly failed after launch and before <br/> communicating back to mpirun. This could be caused by a number <br/> of factors, including an inability to create a connection back <br/> to mpirun due to a lack of common network interfaces and/or no <br/> route found between them. Please check network connectivity <br/> (including firewalls and network routing requirements). </p> <p style=""> in cclm.job.sh around 515th row my script goes like this (should I add syntax or options after mpirun command? </p> <pre><code class="language-plaintext">echo ----- start CCLM mpirun ${CCLM_EXE} echo ----- CCLM finished</code></pre> <p style=""> also, no matter if I update cclm's number of processors in the subchain or chane npx, npy in the job_settings accordingly, i still get the ORTE message. </p> <p style=""> could anyone help? </p> <p style=""> yours </p>

Running CCLM on multiple nodes, An ORTE daemon unexpectedly quits

I am trying to run cclm on multiple nodes (to utilize more processors) however i recieve this message in cclm logs.

An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).

in cclm.job.sh around 515th row my script goes like this (should I add syntax or options after mpirun command?

echo ----- start CCLM
mpirun ${CCLM_EXE}
echo ----- CCLM finished

also, no matter if I update cclm's number of processors in the subchain or chane npx, npy in the job_settings accordingly, i still get the ORTE message.

could anyone help?

yours

View in channel