11 MPlus CPU Use

11.1 Test Program

The CPU performance test program for Mplus was developed after consulting Mplus User’s Guide (8e) (2017, p.708-710). Mplus's MIXTURE analysis has been optimized to use multiple processors, when they are available.

The test problem consisted of running a latent growth model in three latent classes, with 9170 observations. The data were a random subset of an SSCC user’s research data. On Winstat the actual (hardware) limit is 16 processors. On Linstat there are 36 physical cores.

Title: 03 classes;

Data:
      File is csa9170.dat;
Variable:
      Names are
         idnew cbsa4
         pwt0 pbt0 pht0 pat0
         pwt1 pbt1 pht1 pat1
         pwt2 pbt2 pht2 pat2
         pwt3 pbt3 pht3 pat3
         pwt4 pbt4 pht4 pat4 ;
      Missing are all (-9999) ;
      Usevariables are
         pbt0 pht0 pat0
         pbt1 pht1 pat1
         pbt2 pht2 pat2
         pbt3 pht3 pat3
         pbt4 pht4 pat4 ;
    AUXILIARY = idnew ;
    CLASSES = c(3);
Analysis:
             TYPE = MIXTURE;
             STARTS =  200 20;
             STSEED = 218783;
             !OPTSEED = 46371;
             Processor=12;

MODEL:
       %OVERALL%
        ib sb | pbt0@0 pbt1@1 pbt2@2 pbt3@3 pbt4@4 ;
        ih sh | pht0@0 pht1@1 pht2@2 pht3@3 pht4@4 ;
        ia sa | pat0@0 pat1@1 pat2@2 pat3@3 pat4@4 ;
        
        ib ih ia sb sh sa;


OUTPUT: tech11 ;

I used the MplusAutomation package in R to generate, run, and analyze the Mplus jobs.

11.2 Number of CPUs

I did 15 repetitions of the latent class problem at each of 4, 8, 12, and 16 CPUs on Winstat, with Mplus setting the number of processors requested. On Linstat I tested 8, 16, 24, and 32 CPUs. The times (in seconds) to complete each latent class problem on Winstat were taken with no other active users (one disconnected user). The times on Linstat were taken 3 cores in use by other users.

CPUs	4	8	12	16	24	32
Winstat mean (sec)	121.6	83.7	79.1	84.5	-	-
sd	1.15	3.17	3.25	4.84	-	-
——————-	——	—-	—-	—-	—-	—-
Linstat mean	-	78.8	-	65.8	48.3	53.8
sd	-	7.97	-	4.21	8.55	9.67

The Linstat results for 32 CPUs are measurably (p=.00x) slower than those for 24 CPUs, suggesting there is a penalty for requesting more cores than are available. The Winstat results for 16 CPUs are measurably slower than those for 12 CPUs (p=.00x), suggesting that there is a penalty for requesting all of the available CPUs.

11.3 Competing for CPUs

Next these tests were performed on a Winstat server where another Mplus program was also requesting many CPUs. Two copies of the program (modified to produce separate output files) were launched in quick succession. These were performed with both 8 CPU and 16 CPU requests.

Times for each latent class between competing runs of the same CPU request were compared with a t-test. There was no measurable difference between the first run launched and the second run. Competing programs were affected equally, so the times are pooled here.

Competing on Winstat	8 CPUs	16 CPUs	problems	30 problems @ 8 CPUs each
1 program	83.7	84.5	15	\((83.7\times 15)\times 2 = 2511\) sec
2 programs	112.0	448.3	30	\((14.7\times 15) = 1680\) sec

Adding CPUs speeds processing of individual problems, as long as the available CPUs are not all in use. And having to compete for CPUs slows down individual problems, again no surprise.

There is a substantial penalty when the number of CPUs requested actually exceeds the number available.

Requesting exactly as many CPUs as are available results in greater computational efficiency - more work is accomplished in less time. However requesting more CPUs than exist can produce substantial inefficiency!

Note: additional discussion of Mplus’ problems when requesting too many cores is here,
http://www.statmodel.com/discussion/messages/11/2261.html?1504051922