Posts by John McLeod VII

1) Message boards : Number crunching : mdrun 7.01/14 runtime (Message 1394)
Posted 24 Sep 2009 by John McLeod VII
Post:
Problem:

#1 No checkpoint :-( I'll work on that now that I have pointed out to our wonderful gpu integration team how to work on environmental variables :-)

#2 Bad estimate on the length of our runs :-(
I don't want these running more than an hour. ZPM and I estimated 1 second per step? How about we keep it to 3600 steps?

Really long tasks need to checkpoint someplace. The combination of somewhat shorter tasks and checkpoints will be welcome.

Meanwhile, what do we do with the tasks that seem to run for a long time? Do we abort them, or do we continue on?

[edit]

I am seeing some really long run times with 7.03.
2) Message boards : Number crunching : Estimated WU duration = 00:00:00 - WU hasn't even started (Message 1301)
Posted 15 Sep 2009 by John McLeod VII
Post:
If the average TDCF is above 10, please MULTIPLY the averate TDCF by the current FPOPS_EST to get the new FPOPS _EST. It looks like you have been dividing the FPOPS_EST by the average TDCF. At this point, you should probably multiply the FPOPS_EST for the next batch by something well over 100. It is going to take a while for the Duration Correction Factor to fall back in range again.
3) Message boards : Number crunching : progress bar not really working (Message 1228)
Posted 9 Sep 2009 by John McLeod VII
Post:
The other problem with the progress bar is the massively wrong estimates for fpops. They have recently gotten much worse. I believe that they may have been divided by some number between 60 and 70 when they should have been multiplied by about that amount. the tasks still take a couple of hours, but are now tagged at < 1 minute.
4) Message boards : Number crunching : CPU time resets on a restart. (Message 1204)
Posted 8 Sep 2009 by John McLeod VII
Post:
we know about this issue as there's not much of checkpointing in the current autodock app. theirs only one checkpoint and that is like 1-2 minutes into the wu. it's something we need to work on. it may get some attention after gpu gromacs is off the ground which maybe some time.

The problem is that when left in memory, the shim application apparently unloads the worker application.
5) Message boards : Number crunching : CPU time resets on a restart. (Message 1188)
Posted 6 Sep 2009 by John McLeod VII
Post:
I just watched a Drug Discovery task drop from 54 minutes (where it was stopped to allow another task to run EDF) to a small number when the task was resumed.

In many projects (this one?) credit is granted based on benchmarks * time. The standard is that if there is one task when credit is granted, that task gets its credit request. If there are two, the high is dropped (to discourage cheating) and the low is granted to both. If there are 3 or more, the high and low credit requests are dropped and the rest are averaged.

The other possibility is that the task was restarted to the last checkpoint even though keep tasks in memory is set. If at all possible, it would be good to respect this setting when stopping the worker process.
6) Message boards : Number crunching : progress bar not really working (Message 1164)
Posted 2 Sep 2009 by John McLeod VII
Post:
Is there any way to rework the % complete so it looks more like:

0.01
0.05
0.1
0.5
1
3
5

?
7) Questions and Answers : Web site : Email notifications not working? (Message 1102)
Posted 27 Aug 2009 by John McLeod VII
Post:
I have subscribed to a thread, yet I did not receive notifications that there were posts to the thread. I did ensure that I have the "send email immediately" option selected for the notification option.

PS. I a about to subscribe to this thread as well.
8) Message boards : Number crunching : Project initialization fails (Message 1099)
Posted 27 Aug 2009 by John McLeod VII
Post:
this is odd.

i've pm'd jack..

hopefuly, he can get this resolved.

It's Alpha. Fixing things is sort of the point.
9) Message boards : Cafe : Welcome ATAs... (Message 1097)
Posted 27 Aug 2009 by John McLeod VII
Post:
Wasn't allowed in on the first round.
10) Message boards : Number crunching : Project initialization fails (Message 1095)
Posted 27 Aug 2009 by John McLeod VII
Post:
idk what could be the problem, I've connected on my computer.

try restarting the computer to see if that clears anything, also, could you

put work fetch debug in the cc_config.xml file

under log_flags

<work_fetch_debug>1</work_fetch_debug>

i can't really say whats going on with just that error.

I am having the same problem.

8/26/2009 8:13:39 PM http://boinc.drugdiscoveryathome.com/ Sending scheduler request: Requested by user.
8/26/2009 8:13:39 PM http://boinc.drugdiscoveryathome.com/ Not reporting or requesting tasks
8/26/2009 8:13:39 PM [http_debug] HTTP_OP::init_post(): http://boinc.drugdiscoveryathome.com/DrugDiscovery_cgi/cgi
8/26/2009 8:13:39 PM [http_debug] HTTP_OP::libcurl_exec(): ca-bundle set
8/26/2009 8:13:39 PM [http_debug] [ID#0] info: timeout on name lookup is not supported
8/26/2009 8:13:39 PM [http_debug] [ID#0] info: About to connect() to boinc.drugdiscoveryathome.com port 80 (#0)
8/26/2009 8:13:39 PM [http_debug] [ID#0] info: Trying 69.12.222.9...
8/26/2009 8:13:39 PM [http_debug] [ID#0] info: Connected to boinc.drugdiscoveryathome.com (69.12.222.9) port 80 (#0)
8/26/2009 8:13:39 PM [http_debug] [ID#0] Sent header to server: POST /DrugDiscovery_cgi/cgi HTTP/1.1
User-Agent: BOINC client (windows_intelx86 6.10.0)
Host: boinc.drugdiscoveryathome.com
Accept: */*
Accept-Encoding: deflate, gzip
Content-Type: application/x-www-form-urlencoded
Content-Length: 11343
Expect: 100-continue


8/26/2009 8:13:39 PM [http_debug] [ID#0] Received header from server: HTTP/1.1 100 Continue

8/26/2009 8:13:40 PM [http_debug] [ID#0] Received header from server: HTTP/1.1 500 Internal Server Error

8/26/2009 8:13:40 PM [http_debug] [ID#0] Received header from server: Date: Thu, 27 Aug 2009 00:13:38 GMT

8/26/2009 8:13:40 PM [http_debug] [ID#0] Received header from server: Server: Apache/2.2.9 (Fedora)

8/26/2009 8:13:40 PM [http_debug] [ID#0] Received header from server: Content-Length: 643

8/26/2009 8:13:40 PM [http_debug] [ID#0] Received header from server: Connection: close

8/26/2009 8:13:40 PM [http_debug] [ID#0] Received header from server: Content-Type: text/html; charset=iso-8859-1

8/26/2009 8:13:40 PM [http_debug] [ID#0] Received header from server:

8/26/2009 8:13:40 PM [http_debug] [ID#0] info: Expire cleared
8/26/2009 8:13:40 PM [http_debug] [ID#0] info: Closing connection #0
8/26/2009 8:13:44 PM http://boinc.drugdiscoveryathome.com/ Scheduler request failed: HTTP internal server error
8/26/2009 8:13:44 PM [work_fetch_debug] Request work fetch: RPC complete



©2017 All rights reserved | Design by Digital BioPharm Ltd