Monday, October 09, 2006

Tutorial completed successfully on ene-411-lmarr with pgf

Yep, I managed to get through the entire tutorial without surgery on any code/scripts. Here's the complete log of how it's done:

TUTORIAL_PROCEDURE.txt:
Step 1)
README.txt:
Steps 1-6 complete.
Step 7)
IOAPI.txt:
Step 1-5 complete (export BIN=Linux2_x86pg).
Step 6) complete.
Step 8)
Skipped bldit.se.pgf
bldit.se_noop.pgf: complete.
Step 9)
Skipped
Step 10)
bldit.m3bld: complete.
Step 11)
bldit.jproc.pgf:
/usr/bin/ld: cannot find -lioapi
Forgot to link src/Linux2_x86pg. Fixed.
/usr/bin/ld: cannot find -lnetcdf
Forgot to build netCDF. Go back and do so...
CVS_NETCDF.txt:
(Building netCDF-3.6.0-p1 for Linux)
Steps 1-8 complete.
Fixed.
/opt/pgi/linux86/6.2/bin/pgf90 -Bstatic *.o -L/home/jlinford/models/jlinford/cmaq/lib/ioapi_3/Linux2_x86pg -lioapi -L/home/jlinford/models/jlinford/cmaq/lib/netCDF/Linux -lnetcdf -o JPROC_d1a
/home/jlinford/models/jlinford/cmaq/lib/ioapi_3/Linux2_x86pg/libioapi.a(rdatt3.o)(.text+0xa8a): In function `rdattc_':
: undefined reference to `nf_inq_att__'
/home/jlinford/models/jlinford/cmaq/lib/ioapi_3/Linux2_x86pg/libioapi.a(rdatt3.o)(.text+0xd71): In function `rdattc_':
: undefined reference to `nf_get_att_text__'
/home/jlinford/models/jlinford/cmaq/lib/ioapi_3/Linux2_x86pg/libioapi.a(rdatt3.o)(.text+0xe90): In function `rdattc_':
: undefined reference to `nf_get_att_int__'
..................... and many more ...........................

Try rebuilding ioapi_3 now that netCDF is in place...
IOAPI.txt:
Steps 1-5 complete (export BIN=Linux2_x86pg).
** NEW: Patched Makeinclude.Linux2_x86pg to use gcc as described.
Modified MXVARS3 to be 300 instead of 120 (just in case).
Step 6) complete.

bldit.jproc.pgf:
Completed successfully!
run.jproc 2>&1 | tee jproc.log:
Lots of errors about file permissions in /mnt/data/shenair/tutorial
chmod -R g+w /mnt/data/shenair/tutorial/*
run.jproc 2>&1 | tee jproc.log:
Complete. No errors in log file.
Step ICON)
bldit.icon.pgf: Complete. No errors.
run.icon 2>&1 | tee icon.log: Failed.
Errors from icon.log:
....
setenv INIT_CONC_1 /mnt/data/shenair/tutorial/icon/ICON_cb4_M_32_99TUT02_profile -v
....
*** ERROR ABORT in subroutine OPN_IC_FILE
Could not open nor create INIT_CONC_1 file
Make sure group is correct on tutorial directory:
sudo chgrp -R models /mnt/data/shenair/tutorial
run.icon 2>&1 | tee icon.log: Failed.
Errors from icon.log: Exactly the same as before
Remove old file ICON_cb4_M_32_99TUT02_profile:
rm /mnt/data/shenair/tutorial/icon/ICON_cb4_M_32_99TUT02_profile
run.icon 2>&1 | tee icon.log: Complete. No errors.
Step BCON)
bldit.bcon.pgf: Complete. No errors.
run.bcon 2>&1 | tee bcon.log: Failed.
Errors from bcon.log:
....
setenv BNDY_CONC_1 /mnt/data/shenair/tutorial/bcon/BCON_cb4_M_32_99TUT02_profile -v
....
*** ERROR ABORT in subroutine OPN_BC_FILE
Could not open nor create BNDY_CONC_1 file
Remove old file BCON_cb4_M_32_99TUT02_profile:
rm /mnt/data/shenair/tutorial/bcon/BCON_cb4_M_32_99TUT02_profile
run.bcon 2>&1 | tee bcon.log: Complete. No errors.
Step CCTM, Day 1)
Modified bldit.cctm.pgf and run.cctm for serial execution.
bldit.cctm.pgf: Complete. No errors.
run.cctm 2>&1 | tee cctm.log: Failed.
Errors from cctm.log:
/mnt/data/shenair/tutorial/cctm/CCTM_e1aCONC.e1a already exists
Remove old file
cd /mnt/data/shenair/tutorial/cctm/
mkdir jclbak
mv * jclbak
run.cctm 2>&1 | tee cctm.log: Complete. No errors.
Step CCTM, Day 2)
Modified run.cctm as described.
./run.cctm 2>&1 | tee cctm_e1b.log: Failed
Errors from cctm.log:
/mnt/data/shenair/tutorial/cctm/CCTM_e2aCONC.e2a not found
Need to replace all occurances of e2a with e1a and e2b with e1b becase we are not parallel.
./run.cctm 2>&1 | tee cctm_e1b.log: Complete. No errors.
Step ICON, rebuild)
Modified bldit.icon and run.icon as described (used CCTM_e1aCONC.e1a instead of CCTM_e2aCONC.e2a (serial execution)).
bldit.icon.pgf: Complete. No errors.
run.icon 2>&1 | tee icon.log: Complete. No errors.
Step BCON, rebuild)
Modified bldit.bcon as described.
bldit.bcon.pgf: Complete. No errors.
Modified run.bcon for Day 1 as described (used CCTM_e1aCONC.e1a instead of CCTM_e2aCONC.e2a (serial execution)).
run.bcon 2>&1 | tee bcon.log: Complete. No errors.
Modified run.bcon for Day 2 as described (used CCTM_e1aCONC.e1b instead of CCTM_e2aCONC.e2b (serial execution)).
run.bcon 2>&1 | tee bcon.log: Complete. No errors.
Step CCTM, rerun)
Modified run.cctm as described for day 1.
run.cctm 2>&1 | tee cctm.log: Complete. No errors.
Modified run.cctm as described for day 2.
run.cctm 2>&1 | tee cctm.log: Complete. No errors.

I'd just like to say....

THERE IS ABSOLUTELY NO REASON FOR THIS CONVOLUTED, COMPLICATED, UNSTABLE PROCESS! What's wrong with the computer science communitiy? This is not the first time I've had to drill holes in my head to get an air quality model to compile. And THIS ISN'T EVEN WITH MPI BUILT IN! Can't we design a simple build process for HPC applications?!?!? I'm going to have to answer that one, because it doesn't need to be this hard.

1 Comments:

At 11:02 AM, Blogger Spacefleet said...

John, your posting was helpful. I got stuck at bcon, with no error message about an existing file. I renamed the existing one and ran again - all okay. I agree with you that an AQ model shouldn't be as complex and painful as this.

 

Post a Comment

<< Home