UP, DOWN, CPU WU ERRORS

Message boards : Number crunching : UP, DOWN, CPU WU ERRORS

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 15 · Next

AuthorMessage
Profile robertmiles

Send message
Joined: 13 Oct 09
Posts: 105
Credit: 21,462
RAC: 0
Message 2104 - Posted: 12 Jan 2010, 2:50:02 UTC

A few autodock v1.67 workunits with quick errors:

rcs_ga_run_10_bt_Fzd2-MD7-MD8-7.zip_lig_37012_ChemDiv_K953-0344_ts_1262371232831294000_6

rcs_ga_run_10_bt_Fzd2-MD7-MD8-7.zip_lig_26876_ChemDiv_5522-0332_ts_1262370979097200000_7

rcs_ga_run_10_bt_Fzd2-MD7-MD8-7.zip_lig_29380_ChemDiv_3389-1418_ts_1262371053844828000_7

All with this type of error messages:

wrapper: starting
11:38:19 (2864): wrapper: running unzip (-qq -o "*.zip" -d ".")
can't run app: -148
11:38:19 (2864): called boinc_finish

Some of the things I've seen previously suggest that a quick fix for these problems would be to modify the workunits so that if they are running under Vista (and probably other Windows versions also), the wrapper should run unzip.exe instead of unzip.

Also another workunit with a different problem:

rcs_ga_run_10_bt_Fzd2-MD7-MD8-7.zip_lig_31198_ChemDiv_5824-0051_ts_1263195644863222000_1

- exit code 195 (0xc3)

Looks like a normal run until this:

07:37:17 (3944): wrapper: running ./Python25/python.exe ("./top_summary.py" summary_1.txt summary_2.txt summary_3.txt summary_4.txt summary_5.txt summary_6.txt summary_7.txt)
app exit status: 0x1
07:37:31 (3944): called boinc_finish

Already failed for three wingmates.
ID: 2104 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Nikolay A. Saharov

Send message
Joined: 20 Apr 09
Posts: 7
Credit: 21,762
RAC: 36
Message 2105 - Posted: 12 Jan 2010, 11:14:39 UTC

Hi

In job.xml file for this result 1659448 I've found this subtask:
<task>
        <application>autogrid</application>
        <stdout_filename>stdout</stdout_filename>
        <stderr_filename>stderr</stderr_filename>
        <command_line> -p receptor_4.gpf -l out_4.glg </command_line>
        <weight>1</weight>
    </task>    


It seems there is a wrong cmd line parameter "-l out_4.glg". In all other autogrid subtasks this parameter is "-l out.glg" (without _4).

Is this correct?
ID: 2105 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile nenym

Send message
Joined: 23 Apr 09
Posts: 99
Credit: 511,306
RAC: 2,135
Message 2118 - Posted: 14 Jan 2010, 3:56:53 UTC

All 5.12 rcs_ga_run_10.... download failed.
rcs_ga_run_10_bt_Fzd2-MD7-MD8-7.zip_lig_36514_ChemDiv_C250-0568_ts_1262756508980918000_6
rcs_ga_run_10_bt_Fzd2-MD7-MD8-7.zip_lig_35829_ChemDiv_C619-0166_ts_1262752145865422000_7
rcs_ga_run_10_bt_Fzd2-MD7-MD8-7.zip_lig_29852_ChemDiv_4456-3468_ts_1262371064619665000_5
<core_client_version>6.10.24</core_client_version>
<![CDATA[
<message>
WU download error: couldn't get input files:
<file_xfer_error>
  <file_name>Fzd2-MD7-MD8-7.zip_1262756508980918000.zip</file_name>
  <error_code>-224</error_code>
  <error_message>file not found</error_message>
</file_xfer_error>

</message>
]]>


ID: 2118 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Jack Shultz
Avatar

Send message
Joined: 10 Apr 09
Posts: 503
Credit: 120,150
RAC: 0
Message 2121 - Posted: 14 Jan 2010, 12:10:06 UTC - in response to Message 2118.  

It must be isolated to those specific workunits. I was trying to clear out old files. But obviously some of the ones I deleted were part of a pending workunit
ID: 2121 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile nenym

Send message
Joined: 23 Apr 09
Posts: 99
Credit: 511,306
RAC: 2,135
Message 2122 - Posted: 14 Jan 2010, 13:29:17 UTC
Last modified: 14 Jan 2010, 13:30:21 UTC

Current 5.12 autodock_ga_run_10... batch run excelently, error only time to time.
Nice.
The same error
core_client_version>6.10.24</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)
</message>
<stderr_txt>
wrapper: starting
10:55:09 (19121): wrapper: running unzip (-qq -o "./MGLTools*.zip" -d ".")
app exit status: 0x6c00
10:55:10 (19121): called boinc_finish
</stderr_txt>
]]>

occured:
autodock_ga_run_10_bt_1ijy_w_md1_Autodock.pdb_lig_24737_ChemDiv_0090-0_ts_1263354562163008000_5
autodock_ga_run_10_bt_1ijy_w_md1_Autodock.pdb_lig_24585_ChemDiv_0083-0_ts_1263349878432569000_6
on Ubuntu 9.04 64bit ID 1800 C2D 3.0GHz 2GB RAM;
autodock_ga_run_10_bt_1ijy_w_md1_Autodock.pdb_lig_24743_ChemDiv_3448-9_ts_1263354742340998000_5
autodock_ga_run_10_bt_1ijy_w_md1_Autodock.pdb_lig_23883_ChemDiv_000A-0_ts_1263330122952662000_5
on Ubuntu 9.10 64bit ID 1029 AMD Turion RM-70 1.7GB RAM;
autodock_ga_run_10_bt_1ijy_w_md1_Autodock.pdb_lig_24135_ChemDiv_000A-0_ts_1263337388616959000_7
autodock_ga_run_10_bt_1ijy_w_md1_Autodock.pdb_lig_24719_ChemDiv_0090-0_ts_1263354141818637000_4
on Ubuntu 9.04 64bit ID 1065 C2D 2.33 GHz 2GB RAM.
I have crunched about 220 tasks yet - only 6 compute errors.
ID: 2122 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Saenger
Avatar

Send message
Joined: 21 Apr 09
Posts: 53
Credit: 48,577
RAC: 78
Message 2123 - Posted: 14 Jan 2010, 18:22:23 UTC

Do 14 Jan 2010 19:20:21 CET	DrugDiscovery	Sending scheduler request: Requested by user.
Do 14 Jan 2010 19:20:21 CET	DrugDiscovery	Reporting 11 completed tasks, not requesting new tasks
Do 14 Jan 2010 19:20:26 CET	DrugDiscovery	Scheduler request completed
Do 14 Jan 2010 19:20:26 CET	DrugDiscovery	Message from server: Project is temporarily shut down for maintenance

And a wee peek at the server_status shows this:
data-driven web pages	vps	Running
upload/download server	vps	Running
scheduler	vps	Not Running
feeder	vps	Not Running
transitioner	vps	Not Running
file_deleter	vps	Not Running


What's wrong? Not that any deadline is looming, but...
Gruesse vom Saenger

For questions about Boinc look in the BOINC-Wiki
ID: 2123 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile [AF>Libristes] Dudumomo

Send message
Joined: 1 May 09
Posts: 63
Credit: 39,940
RAC: 0
Message 2124 - Posted: 15 Jan 2010, 8:11:31 UTC - in response to Message 2123.  

Really nice to have the Linux app.
Thanks !

I do have a lot of "Error while downloading". Don't know why.
And I have the same error as nenym and the others. (Which seems to be for both win and lin)
ID: 2124 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Jack Shultz
Avatar

Send message
Joined: 10 Apr 09
Posts: 503
Credit: 120,150
RAC: 0
Message 2125 - Posted: 15 Jan 2010, 20:24:43 UTC - in response to Message 2124.  

Some errors while download happened with an older version of mdrun. Some of the files were not present in download. I updated that version so they should be downloaded. Regarding the server deamons being off line. I have to take them down for a few minutes because I want to move results our of a directory. I want to run script on a small group of results to test the workflow is running properly.
ID: 2125 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile al@ON
Avatar

Send message
Joined: 1 May 09
Posts: 3
Credit: 454,034
RAC: 785
Message 2127 - Posted: 16 Jan 2010, 13:19:24 UTC - in response to Message 2125.  

Hi Jack,

I've stopped the project because since yesterday evening all my WUs are in "Error while computing" or "Error while downloading", under GNU/Linux 64 bits.

Regards.
"Libre de penser... pensez Libre" =8?()>
ID: 2127 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Jack Shultz
Avatar

Send message
Joined: 10 Apr 09
Posts: 503
Credit: 120,150
RAC: 0
Message 2128 - Posted: 16 Jan 2010, 14:43:26 UTC - in response to Message 2127.  

were they the rcs_ type? I think those are the issue
ID: 2128 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile al@ON
Avatar

Send message
Joined: 1 May 09
Posts: 3
Credit: 454,034
RAC: 785
Message 2132 - Posted: 16 Jan 2010, 16:44:26 UTC - in response to Message 2128.  

ID: 2132 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Jack Shultz
Avatar

Send message
Joined: 10 Apr 09
Posts: 503
Credit: 120,150
RAC: 0
Message 2133 - Posted: 16 Jan 2010, 17:22:42 UTC - in response to Message 2132.  

I noticed many of the input files with zip extensions were deleted. I have no idea why. Some of those rcs_ are going through without error. At first I thought it was the python script. I sent a batch of asgn_rcs_ through to testers. These should run much quicker because I am reducing the number of evaluations per GA.

This link has a job file
http://boinc.drugdiscoveryathome.com/DrugDiscovery/download/22a/asgn_job_10_1263661752123190000.txt

ga_num_evals=100


This is usually set to 1 million.
ID: 2133 · Rating: 0 · rate: Rate + / Rate - Report as offensive
vaughan

Send message
Joined: 21 Apr 09
Posts: 7
Credit: 1,174,667
RAC: 2,612
Message 2134 - Posted: 16 Jan 2010, 22:22:31 UTC
Last modified: 16 Jan 2010, 22:24:10 UTC

All download errors. Win7 64-bit. All tasks of type rcs_ga_run_10...

17/01/2010 9:21:34 AM DrugDiscovery Giving up on download of Fzd2-MD7-MD8-7.zip_1262773058464601000.zip: file not found

The autodock tasks are good.
ID: 2134 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Jack Shultz
Avatar

Send message
Joined: 10 Apr 09
Posts: 503
Credit: 120,150
RAC: 0
Message 2137 - Posted: 17 Jan 2010, 3:21:43 UTC - in response to Message 2134.  

Download problem for rcs_ not sure what is causing it to disappear.
However, the top_summary.py issue has been debugged and I see a problem with it

It was reporting null values for positive energies. I had to create a flow control to catch that and for positive energies take the value to the right by one increment.
split(" ")[4].
to
split(" ")[5].

import linecache 
import sys

score = 0
E = []
P = [0.106, 0.317, 0.091, 0.183, 0.094, 0.171, 0.039]
i=1
for arg in sys.argv[1:]:
        if linecache.getline(sys.argv[i], 2).split(" ")[4].split(",")[0].split(",").pop(0) == "":
                E.append(float(linecache.getline(sys.argv[i], 2).split(" ")[5].split(",")[0].split(",").pop(0)))
        else:
                linecache.getline(sys.argv[i], 2).split(" ")[4].split(",")[0].split(",").pop(0)
                E.append(float(linecache.getline(sys.argv[i], 2).split(" ")[4].split(",")[0].split(",").pop(0)))
        print score, " + (", E[i-1], " * ", P[i-1], ") = ", score + E[i-1] * P[i-1]
        score = score + (E[i-1] * P[i-1])
        i = i + 1

f = open('score.txt', 'w')
f.write(str(score))

ID: 2137 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Jack Shultz
Avatar

Send message
Joined: 10 Apr 09
Posts: 503
Credit: 120,150
RAC: 0
Message 2138 - Posted: 17 Jan 2010, 3:35:15 UTC - in response to Message 2137.  

I think one of the scripts were clearing old asgn_ input files. I just deleted that line of code.
ID: 2138 · Rating: 0 · rate: Rate + / Rate - Report as offensive
darkpella

Send message
Joined: 21 Apr 09
Posts: 8
Credit: 6,937
RAC: 0
Message 2148 - Posted: 18 Jan 2010, 11:26:45 UTC
Last modified: 18 Jan 2010, 11:27:01 UTC

Getting only download failures since Jan 14th.

See here

bye

darkpella
ID: 2148 · Rating: 0 · rate: Rate + / Rate - Report as offensive
darkpella

Send message
Joined: 21 Apr 09
Posts: 8
Credit: 6,937
RAC: 0
Message 2149 - Posted: 18 Jan 2010, 15:28:35 UTC - in response to Message 2148.  

Getting only download failures since Jan 14th.

See here

bye

darkpella

Looks like mdrun is not affected, see here again.

darkpella
ID: 2149 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Jack Shultz
Avatar

Send message
Joined: 10 Apr 09
Posts: 503
Credit: 120,150
RAC: 0
Message 2150 - Posted: 18 Jan 2010, 16:49:13 UTC - in response to Message 2149.  

I cleared out all the pending rcs_autodock. I just started a new bactch. Lets see how it works out.
ID: 2150 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile nenym

Send message
Joined: 23 Apr 09
Posts: 99
Credit: 511,306
RAC: 2,135
Message 2152 - Posted: 19 Jan 2010, 0:38:54 UTC
Last modified: 19 Jan 2010, 1:17:45 UTC

autodock 2.23 (Linux 64bit on C2D 2.33GHz 2GB RAM, ID 1065):
- tasks autodock_ga_run_10... download and run OK
- tasks rcs_ga_run10_bt_Fzd2.zip_lig.... download OK, compute error (1 task finished yet)
<core_client_version>6.10.24</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)
</message>
<stderr_txt>
wrapper: starting
01:19:35 (5089): wrapper: running ../../projects/boinc.drugdiscoveryathome.com/unzip_1.119_i686-pc-linux-gnu (-qq -o "./*.zip" -d ".")
.
.
.
01:50:29 (5089): wrapper: running ../../projects/boinc.drugdiscoveryathome.com/autodock_1.114_i686-pc-linux-gnu (-p ligand_receptor_7.dpf -l out_7.dlg)
01:55:17 (5089): wrapper: running ./Python25/python.exe ("./MGLToolsPckgs/AutoDockTools/Utilities24/summarize_docking.py" -l out_7.dlg -r receptor_7.pdbqt -o summary_7.txt)
01:55:19 (5089): wrapper: running ./Python25/python.exe ("./top_summary.py" summary_1.txt summary_2.txt summary_3.txt summary_4.txt summary_5.txt summary_6.txt summary_7.txt)
app exit status: 0x100
01:55:19 (5089): called boinc_finish

rcs_ga_run_10_bt_Fzd2.zip_lig_24227_ChemDiv_5326-0692_ts_1263588081078840000_6

- tasks rcs_ga_run10_bt_Fzd2-MD7-MD8-7.zip_lig.... all tasks download error
<core_client_version>6.10.24</core_client_version>
<![CDATA[
<message>
WU download error: couldn't get input files:
<file_xfer_error>
  <file_name>Fzd2-MD7-MD8-7.zip_1262952621389340000.zip</file_name>
  <error_code>-224</error_code>
  <error_message>file not found</error_message>
</file_xfer_error>

</message>
]]>

example
rcs_ga_run_10_bt_Fzd2-MD7-MD8-7.zip_lig_24783_ChemDiv_K788-2896_ts_1262952621389340000_5
ID: 2152 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile nenym

Send message
Joined: 23 Apr 09
Posts: 99
Credit: 511,306
RAC: 2,135
Message 2153 - Posted: 19 Jan 2010, 9:16:45 UTC
Last modified: 19 Jan 2010, 9:57:00 UTC

A little question.
Task rcs_ga_run_10_bt_Fzd2.zip_lig_24227_ChemDiv_5326-0692_ts_1263588081078840000_6 that erroed out has been credited by claimed credit. If it is a new rule for autodock rcs_... tasks I will not abort them as I used to according to Jack's message 2069.
ID: 2153 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 15 · Next

Message boards : Number crunching : UP, DOWN, CPU WU ERRORS


©2017 All rights reserved | Design by Digital BioPharm Ltd