mdrun 7.01/14 runtime

Message boards : Number crunching : mdrun 7.01/14 runtime

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
David Ball

Send message
Joined: 1 May 09
Posts: 2
Credit: 20,346
RAC: 0
Message 1381 - Posted: 23 Sep 2009, 3:55:30 UTC

I got 9 of these long mdrun 7.01 workunits today from DDAH and they immediately went into high priority. I'm running about 15% DDAH on this machine and boinc is set to keep 1.5 days of work. My DCF is 1.000

It's a C2D dual core cpu and the 2 that are running have each used about 9 hours cpu time and are still at 50%. I'm fairly sure that they reached the 50% mark shortly after starting. How long do these things run?

Machine:
Boinc 6.6.36
Vista Home Premium 32 bit
Intel Core 2 Duo at 2.13Ghz (4MB cache) E6420 Conroe
4 GB ram with about 3.2 GB Ram recognized by Vista
Intel Integrated graphics 945G chipset
ID: 1381 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Tim Turner
Avatar

Send message
Joined: 1 May 09
Posts: 570
Credit: 184,322
RAC: 0
Message 1382 - Posted: 23 Sep 2009, 4:08:06 UTC - in response to Message 1381.  

hours, days...!
Tim Turner
Public Relations Admin
Secunia PSI: http://secunia.com/vulnerability_scanning/personal/
If you need help via voice or Convo; PM me and i will give you details on where i will be; Teamspeak, Yahoo Messenger, or Skype.
ID: 1382 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile [AF>EDLS] frederic abussan
Avatar

Send message
Joined: 1 May 09
Posts: 30
Credit: 849,263
RAC: 0
Message 1383 - Posted: 23 Sep 2009, 4:39:51 UTC - in response to Message 1382.  

I am perplexed 50 % ...
Here today gone tomorrow
ID: 1383 · Rating: 0 · rate: Rate + / Rate - Report as offensive
T0lsty
Avatar

Send message
Joined: 23 Apr 09
Posts: 2
Credit: 192,582
RAC: 0
Message 1384 - Posted: 23 Sep 2009, 7:16:02 UTC - in response to Message 1383.  

same situation ..
ID: 1384 · Rating: 0 · rate: Rate + / Rate - Report as offensive
64chrysler300

Send message
Joined: 26 Aug 09
Posts: 1
Credit: 18,609
RAC: 0
Message 1385 - Posted: 23 Sep 2009, 8:56:37 UTC

Same here. 10+ hours, 50% done and 145+ hours to completion? W/U's that are still in que say 1103 hours to complete! Running a quad, vista 64 bit.

Rob
ID: 1385 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Ageless
Avatar

Send message
Joined: 11 Apr 09
Posts: 172
Credit: 7,631
RAC: 0
Message 1387 - Posted: 23 Sep 2009, 10:09:58 UTC
Last modified: 23 Sep 2009, 10:13:32 UTC

Not only that, but why are they coming in and while they are still downloading already running, going to 100%, uploading, downloading more, running, uploading etc. to eventually get stuck at that 50% and 221h runtime, with a deadline 3 days away and these messages:

23-Sep-09 12:05:09 DrugDiscovery [cpu_sched_debug] Result asgn_md_md_100ps_P9_LEU75_ILE78_LEU121_MET122_41015_LOPAC_Sigma_1253641471465605348_21614_1253699822_11 projected to miss deadline.

same for 12 to 18

23-Sep-09 12:05:09 DrugDiscovery [cpu_sched_debug] Result asgn_md_md_100ps_P9_LEU75_ILE78_LEU121_MET122_41015_LOPAC_Sigma_1253641487340314367_21614_1253699822_19 projected to miss deadline.
23-Sep-09 12:05:09 DrugDiscovery [cpu_sched_debug] Project has 9 projected CPU deadline misses

Didn't you download them all separately, run them to 100% and upload them already?

(ah, Show all tasks was in my way)
Jord

'Cause you seem like an orchard of mines, Just take one step at a time.
ID: 1387 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Jack Shultz
Avatar

Send message
Joined: 10 Apr 09
Posts: 503
Credit: 120,150
RAC: 0
Message 1388 - Posted: 23 Sep 2009, 13:44:32 UTC - in response to Message 1387.  

Problem:

#1 No checkpoint :-( I'll work on that now that I have pointed out to our wonderful gpu integration team how to work on environmental variables :-)

#2 Bad estimate on the length of our runs :-(
I don't want these running more than an hour. ZPM and I estimated 1 second per step? How about we keep it to 3600 steps?
ID: 1388 · Rating: 0 · rate: Rate + / Rate - Report as offensive
John McLeod VII
Avatar

Send message
Joined: 26 Aug 09
Posts: 10
Credit: 17,643
RAC: 0
Message 1394 - Posted: 24 Sep 2009, 17:20:10 UTC - in response to Message 1388.  
Last modified: 24 Sep 2009, 17:20:58 UTC

Problem:

#1 No checkpoint :-( I'll work on that now that I have pointed out to our wonderful gpu integration team how to work on environmental variables :-)

#2 Bad estimate on the length of our runs :-(
I don't want these running more than an hour. ZPM and I estimated 1 second per step? How about we keep it to 3600 steps?

Really long tasks need to checkpoint someplace. The combination of somewhat shorter tasks and checkpoints will be welcome.

Meanwhile, what do we do with the tasks that seem to run for a long time? Do we abort them, or do we continue on?

[edit]

I am seeing some really long run times with 7.03.


BOINC WIKI
ID: 1394 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Tim Turner
Avatar

Send message
Joined: 1 May 09
Posts: 570
Credit: 184,322
RAC: 0
Message 1398 - Posted: 26 Sep 2009, 18:34:31 UTC - in response to Message 1394.  

updated version to 7.08, and this should bring progress bar accurate progress.
Tim Turner
Public Relations Admin
Secunia PSI: http://secunia.com/vulnerability_scanning/personal/
If you need help via voice or Convo; PM me and i will give you details on where i will be; Teamspeak, Yahoo Messenger, or Skype.
ID: 1398 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Jack Shultz
Avatar

Send message
Joined: 10 Apr 09
Posts: 503
Credit: 120,150
RAC: 0
Message 1401 - Posted: 26 Sep 2009, 23:22:02 UTC - in response to Message 1398.  

We made some progress, but I'm not happy with it yet. A fraction of 1% is not going to do it for me or you.
ID: 1401 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Tim Turner
Avatar

Send message
Joined: 1 May 09
Posts: 570
Credit: 184,322
RAC: 0
Message 1402 - Posted: 26 Sep 2009, 23:54:29 UTC - in response to Message 1401.  

We made some progress, but I'm not happy with it yet. A fraction of 1% is not going to do it for me or you.

woops.
Tim Turner
Public Relations Admin
Secunia PSI: http://secunia.com/vulnerability_scanning/personal/
If you need help via voice or Convo; PM me and i will give you details on where i will be; Teamspeak, Yahoo Messenger, or Skype.
ID: 1402 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Ageless
Avatar

Send message
Joined: 11 Apr 09
Posts: 172
Credit: 7,631
RAC: 0
Message 1403 - Posted: 27 Sep 2009, 0:47:37 UTC
Last modified: 27 Sep 2009, 0:49:00 UTC

We're getting closer though. I just tested some mdrun work and we now have a reasonable estimate on the 50steps and 500steps work. The progress bar goes in increments of 0.2%, but it only runs from 0 - 1%, then it's done. ;-)

Now trying to figure out why the wrapper b0rks the work. We seem to run out of memory somewhere.
Jord

'Cause you seem like an orchard of mines, Just take one step at a time.
ID: 1403 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile [AF>Libristes] Pascal94
Avatar

Send message
Joined: 1 May 09
Posts: 7
Credit: 80,287
RAC: 0
Message 1423 - Posted: 29 Sep 2009, 19:38:09 UTC

This WU is runing for 1 hour now, and progress bar indicates 980%

do I have to let it run, or should I abort this task ?
ID: 1423 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Tim Turner
Avatar

Send message
Joined: 1 May 09
Posts: 570
Credit: 184,322
RAC: 0
Message 1425 - Posted: 29 Sep 2009, 19:46:12 UTC - in response to Message 1423.  
Last modified: 29 Sep 2009, 19:46:51 UTC

jack has been working on the checkpointing and progress bar, and it's just a simple math problem.... ok, not so simple.

first it was under 1% and now it's above 100, go figure...

this is with v.7.14 ? correct?
Tim Turner
Public Relations Admin
Secunia PSI: http://secunia.com/vulnerability_scanning/personal/
If you need help via voice or Convo; PM me and i will give you details on where i will be; Teamspeak, Yahoo Messenger, or Skype.
ID: 1425 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Jack Shultz
Avatar

Send message
Joined: 10 Apr 09
Posts: 503
Credit: 120,150
RAC: 0
Message 1426 - Posted: 29 Sep 2009, 19:46:17 UTC - in response to Message 1423.  

If its reporting 980% then something is going wrong. Just cancel. I am still trying to figure out the progress bar enhancements.
ID: 1426 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Van Fanel
Avatar

Send message
Joined: 16 Sep 09
Posts: 17
Credit: 103,550
RAC: 0
Message 1427 - Posted: 29 Sep 2009, 20:31:14 UTC

Just for the record, this WU has the same problem: 219539



Cheers
ID: 1427 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Nikolay A. Saharov

Send message
Joined: 20 Apr 09
Posts: 7
Credit: 21,716
RAC: 34
Message 1471 - Posted: 8 Oct 2009, 18:23:15 UTC
Last modified: 8 Oct 2009, 18:28:03 UTC

Hello,

I have 3 WUs of mdrun 7.14 with deadline 10.05.2011 (results 877458, 877459, 877460).
The time to completion is 87600 hours for each WU and it is not changing.
The current progress is 3.5% and 13 hours of CPU time.
All these WUs are in EDF mode.
ID: 1471 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Ageless
Avatar

Send message
Joined: 11 Apr 09
Posts: 172
Credit: 7,631
RAC: 0
Message 1473 - Posted: 8 Oct 2009, 21:28:54 UTC - in response to Message 1471.  

Check your Task Duration Correction Factor (TDCF). It may well be at 100, throwing off a more coherent ETC. To reset the TDCF to 1, reset the project, or if you know how and are comfortable with that, manually change its number in client_state.xml
Jord

'Cause you seem like an orchard of mines, Just take one step at a time.
ID: 1473 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Nikolay A. Saharov

Send message
Joined: 20 Apr 09
Posts: 7
Credit: 21,716
RAC: 34
Message 1480 - Posted: 9 Oct 2009, 5:01:07 UTC - in response to Message 1473.  

Check your Task Duration Correction Factor (TDCF). It may well be at 100, throwing off a more coherent ETC. To reset the TDCF to 1, reset the project, or if you know how and are comfortable with that, manually change its number in client_state.xml

No problem with this. DCF is 68 :)

What is bad, these WUs are finished with errors (exceeded disk limit: 958.22MB > 953.67MB):

09.10.2009 2:28:08	DrugDiscovery	Aborting task asgn_md_5000000_steps_P10_TYR52_LEU33_GLN56_36717_ChemDiv_E511-3733_1254856773510751281_1254972686336430239_3910_1254973412_1: exceeded disk limit: 958.22MB > 953.67MB
09.10.2009 2:28:08	DrugDiscovery	[task_debug] task_state=ABORT_PENDING for asgn_md_5000000_steps_P10_TYR52_LEU33_GLN56_36717_ChemDiv_E511-3733_1254856773510751281_1254972686336430239_3910_1254973412_1 from abort_task
09.10.2009 2:28:08	DrugDiscovery	[task_debug] result state=COMPUTE_ERROR for asgn_md_5000000_steps_P10_TYR52_LEU33_GLN56_36717_ChemDiv_E511-3733_1254856773510751281_1254972686336430239_3910_1254973412_1 from CS::report_result_error
09.10.2009 2:28:08	DrugDiscovery	[task_debug] result state=ABORTED for asgn_md_5000000_steps_P10_TYR52_LEU33_GLN56_36717_ChemDiv_E511-3733_1254856773510751281_1254972686336430239_3910_1254973412_1 from abort_task
09.10.2009 2:29:09	DrugDiscovery	[task_debug] task_state=ABORTED for asgn_md_5000000_steps_P10_TYR52_LEU33_GLN56_36717_ChemDiv_E511-3733_1254856773510751281_1254972686336430239_3910_1254973412_1 from kill_task
09.10.2009 2:29:09	DrugDiscovery	Computation for task asgn_md_5000000_steps_P10_TYR52_LEU33_GLN56_36717_ChemDiv_E511-3733_1254856773510751281_1254972686336430239_3910_1254973412_1 finished
09.10.2009 2:29:09	DrugDiscovery	[task_debug] result state=COMPUTE_ERROR for asgn_md_5000000_steps_P10_TYR52_LEU33_GLN56_36717_ChemDiv_E511-3733_1254856773510751281_1254972686336430239_3910_1254973412_1 from CS::app_finished
09.10.2009 2:53:12	DrugDiscovery	Aborting task asgn_md_5000000_steps_P10_TYR52_LEU33_GLN56_36735_ChemDiv_C614-0998_1254856788637608611_1254972534674566603_3910_1254973411_0: exceeded disk limit: 953.69MB > 953.67MB
09.10.2009 2:53:12	DrugDiscovery	[task_debug] task_state=ABORT_PENDING for asgn_md_5000000_steps_P10_TYR52_LEU33_GLN56_36735_ChemDiv_C614-0998_1254856788637608611_1254972534674566603_3910_1254973411_0 from abort_task
09.10.2009 2:53:12	DrugDiscovery	[task_debug] result state=COMPUTE_ERROR for asgn_md_5000000_steps_P10_TYR52_LEU33_GLN56_36735_ChemDiv_C614-0998_1254856788637608611_1254972534674566603_3910_1254973411_0 from CS::report_result_error
09.10.2009 2:53:12	DrugDiscovery	[task_debug] result state=ABORTED for asgn_md_5000000_steps_P10_TYR52_LEU33_GLN56_36735_ChemDiv_C614-0998_1254856788637608611_1254972534674566603_3910_1254973411_0 from abort_task
09.10.2009 2:54:12	DrugDiscovery	[task_debug] task_state=ABORTED for asgn_md_5000000_steps_P10_TYR52_LEU33_GLN56_36735_ChemDiv_C614-0998_1254856788637608611_1254972534674566603_3910_1254973411_0 from kill_task
09.10.2009 2:54:12	DrugDiscovery	Computation for task asgn_md_5000000_steps_P10_TYR52_LEU33_GLN56_36735_ChemDiv_C614-0998_1254856788637608611_1254972534674566603_3910_1254973411_0 finished
09.10.2009 2:54:12	DrugDiscovery	[task_debug] result state=COMPUTE_ERROR for asgn_md_5000000_steps_P10_TYR52_LEU33_GLN56_36735_ChemDiv_C614-0998_1254856788637608611_1254972534674566603_3910_1254973411_0 from CS::app_finished
09.10.2009 3:03:13	DrugDiscovery	Aborting task asgn_md_5000000_steps_P10_TYR52_LEU33_GLN56_36788_ChemDiv_C301-4871_1254856839984565851_1254972825392505003_3910_1254973412_2: exceeded disk limit: 954.13MB > 953.67MB
09.10.2009 3:03:13	DrugDiscovery	[task_debug] task_state=ABORT_PENDING for asgn_md_5000000_steps_P10_TYR52_LEU33_GLN56_36788_ChemDiv_C301-4871_1254856839984565851_1254972825392505003_3910_1254973412_2 from abort_task
09.10.2009 3:03:13	DrugDiscovery	[task_debug] result state=COMPUTE_ERROR for asgn_md_5000000_steps_P10_TYR52_LEU33_GLN56_36788_ChemDiv_C301-4871_1254856839984565851_1254972825392505003_3910_1254973412_2 from CS::report_result_error
09.10.2009 3:03:13	DrugDiscovery	[task_debug] result state=ABORTED for asgn_md_5000000_steps_P10_TYR52_LEU33_GLN56_36788_ChemDiv_C301-4871_1254856839984565851_1254972825392505003_3910_1254973412_2 from abort_task
09.10.2009 3:04:13	DrugDiscovery	[task_debug] task_state=ABORTED for asgn_md_5000000_steps_P10_TYR52_LEU33_GLN56_36788_ChemDiv_C301-4871_1254856839984565851_1254972825392505003_3910_1254973412_2 from kill_task
09.10.2009 3:04:13	DrugDiscovery	Computation for task asgn_md_5000000_steps_P10_TYR52_LEU33_GLN56_36788_ChemDiv_C301-4871_1254856839984565851_1254972825392505003_3910_1254973412_2 finished
09.10.2009 3:04:13	DrugDiscovery	[task_debug] result state=COMPUTE_ERROR for asgn_md_5000000_steps_P10_TYR52_LEU33_GLN56_36788_ChemDiv_C301-4871_1254856839984565851_1254972825392505003_3910_1254973412_2 from CS::app_finished
ID: 1480 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Ageless
Avatar

Send message
Joined: 11 Apr 09
Posts: 172
Credit: 7,631
RAC: 0
Message 1483 - Posted: 9 Oct 2009, 6:51:37 UTC - in response to Message 1480.  
Last modified: 9 Oct 2009, 6:52:05 UTC

No problem with this. DCF is 68 :)

Which is still too high. You'd want it to be a lot lower, with a maximum of perhaps 5.

But that said, I suspect the fpops estimate on these tasks to be way off. Can you open your client_state.xml file, find Drug Discovery, find one of the tasks for the mdrun application and post the lines for <rsc_fpops_est> and <rsc_fpops_bound>, please?

Make sure to just close client_state.xml, if you get the question to save changes, click No.
Jord

'Cause you seem like an orchard of mines, Just take one step at a time.
ID: 1483 · Rating: 0 · rate: Rate + / Rate - Report as offensive
1 · 2 · Next

Message boards : Number crunching : mdrun 7.01/14 runtime


©2017 All rights reserved | Design by Digital BioPharm Ltd