Overview

Scaling up the computational resources is a big advantage for doing certain large scale calculations on OSG. Consider the extensive sampling for a multi-dimensional Monte Carlo integration or molecular dynamics simulation with several initial conditions. These type of calculations require submitting lot of jobs.

In the previous example, we submitted the job to a single worker machine. About a million CPU hours per day are available to OSG users on an opportunistic basis. Learning how to scale up and control large numbers of jobs to realizing the full potential of distributed high throughput computing on the OSG.

In this section, we will see how to scale up the calculations with simple example. Once we understand the basic HTCondor script, it is easy to scale up.

Background

For this example, we will use computational methods to estimate pi. First, we will define a square inscribed by a unit circle from which we will randomly sample points. The ratio of the points outside the circle to the points in the circle is calculated which approaches pi/4.

This method converges extremely slowly, which makes it great for a CPU-intensive exercise (but bad for a real estimation!).

Set up an R Job

First, we'll need to create a working directory, you can either run $ tutorial ScalingUp-R or type the following:

$ mkdir tutorial-ScalingUp-R
$ cd tutorial-ScalingUp-R

Test the R Script

Create an R script by typing the following into a file called mcpi.R:

montecarloPi <- function(trials) {
  count = 0
  for(i in 1:trials) {
    if((runif(1,0,1)^2 + runif(1,0,1)^2)<1) {
      count = count + 1
        }
  }
  return((count*4)/trials)
}

montecarloPi(1000)

Now, test your R script to ensure it runs as expected:

$ module load r
$ Rscript --no-save mcpi.R
[1] 3.14

If we were running a more intensive script, we would want to test our pipeline with a shortened, test script first.

Now that we know our script works as expected, we can begin building the necessary scripts so we can run the jobs on OSG.

Build the HTCondor Job

As discussed in the Run R Jobs tutorial, we need to prepare the job execution and the job submission scripts. First, make a wrapper script called R-wrapper.sh.

#!/bin/bash

module load r
Rscript --no-save mcpi.R

This script will load the required module and execute our R script.

Test the wrapper script to ensure it works:

$ ./R-wrapper.sh mcpi.R
[1] 3.14524

Now that we have both our R script and wrapper script written and tested, we can begin building the submit file for our job. If we want to submit several jobs, we need to track log, out and error files for each job. An easy way to do this is to use the Cluster and Process ID values to create unique files for each process in our job.

Create a submit file named R.submit:

executable = R-wrapper.sh
arguments = mcpi.R $(Process)
transfer_input_files = mcpi.R     # mcpi.R is the R program we want to run

log = log/job.log.$(Cluster).$(Process)
error = log/job.error.$(Cluster).$(Process)
output = log/job.out.$(Cluster).$(Process)

requirements = HAS_MODULES == True && OSGVO_OS_STRING == "RHEL 7" && Arch == "X86_64"
queue 100

Note the queue 100. This tells Condor to enqueue 100 copies of this job as one cluster. Also, notice the use of $(Cluster) and $(Process) to specify unique output files. HTCondor will replace these with the Cluster and Process ID numbers for each individual process within the cluster. Let's make the log directory that will hold these files for us.

$ mkdir log

Now it is time to submit our job! You'll see something like the following upon submission:

$ condor_submit R.submit
Submitting job(s).........................
100 job(s) submitted to cluster 837.

Apply your condor_q and connect watch knowledge to see this job progress. Check your log folder to see the individual output files.

Post process⋅

Once the jobs are completed, you can use the information in the output files to calculate an average of all of our computed estimates of Pi.

To see this, we can use the command:

$ cat mcpi*.out | awk '{ sum += $2; print $2"   "NR} END { print "---------------\n Grand Average = " sum/NR }'

Key Points

  • [x] Scaling up the computational resources on OSG is crucial to taking full advantage of grid computing.
  • [x] Changing the value of Queue allows the user to scale up the resources.
  • [x] Arguments allows you to pass parameters to a job script.
  • [x] $(Cluster) and $(Process) can be used to name log files uniquely.

Getting Help

For assistance or questions, please email the OSG User Support team at support@osgconnect.net or visit the help desk.

 

This page was updated on Oct 28, 2020 at 17:22 from tutorials/tutorial-ScalingUp-R/README.md.