UP, DOWN, CPU WU ERRORS

Message boards : Number crunching : UP, DOWN, CPU WU ERRORS

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 15 · Next

AuthorMessage
zombie67 [MM]
Volunteer tester
Avatar

Send message
Joined: 25 Apr 09
Posts: 58
Credit: 1,785,257
RAC: 1,658
Message 2044 - Posted: 31 Dec 2009, 3:47:59 UTC

When will the server be turned back on so that we can report completed work?
@zombie_67

ID: 2044 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Jack Shultz
Avatar

Send message
Joined: 10 Apr 09
Posts: 503
Credit: 120,150
RAC: 0
Message 2045 - Posted: 31 Dec 2009, 4:21:52 UTC - in response to Message 2044.  

should be reporting now sorry about that.
ID: 2045 · Rating: 0 · rate: Rate + / Rate - Report as offensive
zombie67 [MM]
Volunteer tester
Avatar

Send message
Joined: 25 Apr 09
Posts: 58
Credit: 1,785,257
RAC: 1,658
Message 2046 - Posted: 31 Dec 2009, 4:54:45 UTC

Thanks Jack.
@zombie_67

ID: 2046 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile robertmiles

Send message
Joined: 13 Oct 09
Posts: 105
Credit: 21,462
RAC: 0
Message 2050 - Posted: 31 Dec 2009, 13:05:27 UTC - in response to Message 2043.  



Also, another idea you might want to consider eventually: A separate thread for reporting problems for each application software version. This allows anything about problems with obsolete versions of the software to eventually age off what new participants see first when reading the threads with the most recent posts added.


too messy...


So you want any new participants to read through a long list of obsolete problems, and therefore be likely to lose their desire to participate?
ID: 2050 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Jack Shultz
Avatar

Send message
Joined: 10 Apr 09
Posts: 503
Credit: 120,150
RAC: 0
Message 2051 - Posted: 31 Dec 2009, 18:53:14 UTC - in response to Message 2050.  

I'll clean it up
ID: 2051 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Tim Turner
Avatar

Send message
Joined: 1 May 09
Posts: 570
Credit: 184,322
RAC: 0
Message 2054 - Posted: 31 Dec 2009, 21:25:42 UTC - in response to Message 2051.  

I'll clean it up


there's a setting in community preferences to show a certain amount determined by the user of how many post to show per thread...

Tim Turner
Public Relations Admin
Secunia PSI: http://secunia.com/vulnerability_scanning/personal/
If you need help via voice or Convo; PM me and i will give you details on where i will be; Teamspeak, Yahoo Messenger, or Skype.
ID: 2054 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Angus

Send message
Joined: 21 Apr 09
Posts: 6
Credit: 3,855
RAC: 0
Message 2055 - Posted: 31 Dec 2009, 21:43:57 UTC - in response to Message 2043.  

Evidently you don't "get" it.

It's impossible for a newcomer to arrive on the boards here trying to figure out a problem with an application.
ID: 2055 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Jack Shultz
Avatar

Send message
Joined: 10 Apr 09
Posts: 503
Credit: 120,150
RAC: 0
Message 2057 - Posted: 1 Jan 2010, 22:20:24 UTC - in response to Message 2055.  

Still we have some errors with copying input files. We had a whole batch that were not copying the ligands to the download directory. I have to monitor this very carefully to see why they are not copying. When I look at the workunit entries for these failures, I see empty file sizes for the ligands. We use a perl script to copy these files to the directory and it runs the copy using a system function. I will wait a few hours before submitting another batch and run sleep(10) between each workunit to see if things copy properly. It may also be a good idea to add check on filesize before generating a workunit.
ID: 2057 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Jack Shultz
Avatar

Send message
Joined: 10 Apr 09
Posts: 503
Credit: 120,150
RAC: 0
Message 2058 - Posted: 2 Jan 2010, 4:23:18 UTC - in response to Message 2057.  

there was an issue with application files not downloading. I am updating the autodock verion.
ID: 2058 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Angus

Send message
Joined: 21 Apr 09
Posts: 6
Credit: 3,855
RAC: 0
Message 2059 - Posted: 2 Jan 2010, 7:20:12 UTC

What's up with the disappearing completed tasks?

Half a day ago I had over 800 points in pending credits for autodock beta 1.14 tasks. Now there is no record of the tasks and no credits.
ID: 2059 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Woody Woodpecker

Send message
Joined: 16 Nov 09
Posts: 1
Credit: 645,645
RAC: 0
Message 2061 - Posted: 2 Jan 2010, 8:09:48 UTC - in response to Message 2059.  

What's up with the disappearing completed tasks?

Half a day ago I had over 800 points in pending credits for autodock beta 1.14 tasks. Now there is no record of the tasks and no credits.


I lost almost 10 000 pending credits.
ID: 2061 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Jack Shultz
Avatar

Send message
Joined: 10 Apr 09
Posts: 503
Credit: 120,150
RAC: 0
Message 2066 - Posted: 2 Jan 2010, 20:03:37 UTC - in response to Message 2061.  

I'm sorry about that, we had some strange issues with the last batch of work. I was trying to clear out all workunits that had not been sent. We can restore missing credits.
ID: 2066 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile robertmiles

Send message
Joined: 13 Oct 09
Posts: 105
Credit: 21,462
RAC: 0
Message 2069 - Posted: 3 Jan 2010, 4:47:11 UTC
Last modified: 3 Jan 2010, 5:00:02 UTC

I also lost credits for autodock beta 1.14 workunits, a three or four page list of them on my machines. I didn't calculate just how much credit.

The three autodock v1.58 workunits my machines have finished all have this error message in the output, but appear to have finished properly otherwise. A side effect of having debugging turned on?

Memory Leaks Detected!!!

So far, they appear to have been issued to the right number of wingmates, but few of them have returned their outputs yet.

The other alpha test projects I'm participating in appear to be issuing credits for the CPU time used even for unsuccessful workunits, at least those which manage to return enough in the outputs to help pin down the problem. One of them makes this only half as much credit as if the workunit was successsful, though.

You might want to consider doing this also, at least if you're having trouble attracting enough participants.
ID: 2069 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile robertmiles

Send message
Joined: 13 Oct 09
Posts: 105
Credit: 21,462
RAC: 0
Message 2070 - Posted: 3 Jan 2010, 15:25:08 UTC

So far, all my completed autodock v1.59 workunits (over a dozen) gave a compute error with one of these sets of error messages, in 12 seconds or less:


<core_client_version>6.10.18</core_client_version>
<![CDATA[
<message>
- exit code -148 (0xffffff6c)
</message>
<stderr_txt>
wrapper: starting
05:10:12 (5468): wrapper: running unzip (-qq -o "*.zip" -d ".")
can't run app: -148
05:10:12 (5468): called boinc_finish


**********
**********

Memory Leaks Detected!!!



<core_client_version>6.10.18</core_client_version>
<![CDATA[
<message>
- exit code -148 (0xffffff6c)
</message>
<stderr_txt>
wrapper: starting
04:20:08 (6648): wrapper: running 7za.exe (e -y *.zip)
04:20:09 (6648): wrapper: running ./Python25/python.exe (make_sitecustomize.py ".")
can't run app: -148
04:20:09 (6648): called boinc_finish


**********
**********

Memory Leaks Detected!!!
ID: 2070 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Jack Shultz
Avatar

Send message
Joined: 10 Apr 09
Posts: 503
Credit: 120,150
RAC: 0
Message 2071 - Posted: 3 Jan 2010, 16:21:28 UTC - in response to Message 2069.  

The other alpha test projects I'm participating in appear to be issuing credits for the CPU time used even for unsuccessful workunits, at least those which manage to return enough in the outputs to help pin down the problem. One of them makes this only half as much credit as if the workunit was successsful, though.


We already did that for a while. It became a disinsentive to successfully complete work.
ID: 2071 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile robertmiles

Send message
Joined: 13 Oct 09
Posts: 105
Credit: 21,462
RAC: 0
Message 2079 - Posted: 5 Jan 2010, 6:19:32 UTC

This autodock v1.61 workunit where, at least for me, the wrapper tried to run unzip (without .exe), failed quickly for me and all wingmates that have reported it so far.

http://boinc.drugdiscoveryathome.com/workunit.php?wuid=448752

However, note the rather old workunit at the bottom with an unknown status.

All my other other autodock v1.61 workunits so far, where the wrapper ran unzip.exe instead, appear to have been successful.

For my autodock v1.60 workunits so far, the success rate also seems to depend highly on what program the wrapper told to unpack files first.
ID: 2079 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile robertmiles

Send message
Joined: 13 Oct 09
Posts: 105
Credit: 21,462
RAC: 0
Message 2085 - Posted: 6 Jan 2010, 15:33:41 UTC

My autodock v1.67 workunits so far each reported a memory leak, but appear to be OK otherwise. My other recent autodock v1.61 workunits each also reported a memory leak, and are now waiting for enough wingmates to finish to determine if the workunits were successful otherwise.

Could the recent decision to make dd@h LLC and therefore officially for-profit be affecting the ability to get enough participants to fill out the wingmates lists in a reasonable time?
ID: 2085 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Jack Shultz
Avatar

Send message
Joined: 10 Apr 09
Posts: 503
Credit: 120,150
RAC: 0
Message 2095 - Posted: 8 Jan 2010, 15:17:47 UTC - in response to Message 2085.  

My autodock v1.67 workunits so far each reported a memory leak, but appear to be OK otherwise. My other recent autodock v1.61 workunits each also reported a memory leak, and are now waiting for enough wingmates to finish to determine if the workunits were successful otherwise.


We can ignore memory leak until we get to a final, stable version where we don't need to make additional changes. The memory leak appears because we have a debug version of the boinc wrapper. Our application versions are not stable because additional project requirements keep appearing. It would be nice if we had a much more simplified set of requirements, but we have to make certain changes to meet our workflows.

Could the recent decision to make dd@h LLC and therefore officially for-profit be affecting the ability to get enough participants to fill out the wingmates lists in a reasonable time?


Not that I am aware of. My impression is people want to contribute to science and how we organize ourselves legally is not a big problem so long as our profits are invested back into the research. We still have control account creation through an invite code.
ID: 2095 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile robertmiles

Send message
Joined: 13 Oct 09
Posts: 105
Credit: 21,462
RAC: 0
Message 2101 - Posted: 9 Jan 2010, 22:38:06 UTC
Last modified: 9 Jan 2010, 22:52:53 UTC

My last 8 v1.67 workunits have each returned a compute error after running for about as long as the previous successful runs, with nothing I noticed about just what type of error other than the exit status code:

app exit status: 0x1

At least the last one has problems with more of the workunits being issued even after the total number with errors reported by wingmates is more than the limit on the number or workunits, and well over the limit on the number of errors.

On the for-profit issue: How does controlling account creation with the invite code control the number of people who decide to stop participating?
ID: 2101 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Nikolay A. Saharov

Send message
Joined: 20 Apr 09
Posts: 7
Credit: 21,716
RAC: 37
Message 2103 - Posted: 10 Jan 2010, 21:54:48 UTC

21:12:12 (1888): wrapper: running ./Python25/python.exe ("./MGLToolsPckgs/AutoDockTools/Utilities24/summarize_docking.py" -l out_7.dlg -r receptor_7.pdbqt -o summary_7.txt)
21:12:24 (1888): wrapper: running ./Python25/python.exe ("./top_summary.py" summary_1.txt summary_2.txt summary_3.txt summary_4.txt summary_5.txt summary_6.txt summary_7.txt)
app exit status: 0x1
21:12:37 (1888): called boinc_finish



I think python can't find this top_summary.py script. I also can't find it neither in MGLToolsPckgs archive nor DD@H project directory.


ID: 2103 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 15 · Next

Message boards : Number crunching : UP, DOWN, CPU WU ERRORS


©2017 All rights reserved | Design by Digital BioPharm Ltd