Today, When I go to DA Service Monitor, the top 5 processes are using 99.9% of the CPU and they’re all the same processes
11477 root 20 0 324m 196m 1580 R 53.3 5.2 17153:50 /usr/local/directadmin/dataskq 7738 root 20 0 315m 183m 1580 R 49.3 4.8 15702:50 /usr/local/directadmin/dataskq 17973 root 20 0 307m 170m 1592 R 59.5 4.5 14271:42 /usr/local/directadmin/dataskq 15411 root 20 0 285m 159m 1592 R 58.9 4.2 9984:23 /usr/local/directadmin/dataskq 18812 root 20 0 299m 158m 1592 R 50.0 4.2 12829:33 /usr/local/directadmin/dataskq 32016 root 20 0 292m 146m 1592 R 49.7 3.8 11400:47 /usr/local/directadmin/dataskq 10846 root 20 0 254m 125m 1592 R 61.8 3.3 8605:18 /usr/local/directadmin/dataskq 22175 root 20 0 248m 114m 1592 R 54.9 3.0 7239:59 /usr/local/directadmin/dataskq 28472 root 20 0 241m 104m 1592 R 52.6 2.7 5916:28 /usr/local/directadmin/dataskq 2738 root 20 0 239m 98m 1700 R 49.7 2.6 4753:17 /usr/local/directadmin/dataskq 7807 root 20 0 212m 84m 1836 R 49.3 2.2 3698:22 /usr/local/directadmin/dataskq 11449 root 20 0 202m 75m 1836 R 49.7 2.0 1870:38 /usr/local/directadmin/dataskq 6370 root 20 0 205m 73m 1836 R 50.0 1.9 2744:47 /usr/local/directadmin/dataskq 22093 root 20 0 178m 52m 1836 R 52.0 1.4 1042:42 /usr/local/directadmin/dataskq 26249 root 20 0 152m 27m 1844 R 54.9 0.7 266:05.08 /usr/local/directadmin/dataskq
My server is having high load and dataskq is on the ‘top’ list. All services was crashed. How do I fix this problem.
The first, I analytic a process to find problem with bellow command
# lsof -p 11477
Output
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME dataskq 11477 root cwd DIR 259,4 4096 21106008 /usr/local/directadmin dataskq 11477 root rtd DIR 259,4 4096 2 / dataskq 11477 root txt REG 259,4 8893140 21106014 /usr/local/directadmin/dataskq dataskq 11477 root mem REG 259,4 65928 7602205 /lib64/libnss_files-2.12.so dataskq 11477 root mem REG 259,4 122040 7602226 /lib64/libselinux.so.1 dataskq 11477 root mem REG 259,4 10192 7602373 /lib64/libkeyutils.so.1.3 dataskq 11477 root mem REG 259,4 43728 7602382 /lib64/libkrb5support.so.0.1 dataskq 11477 root mem REG 259,4 469528 7602181 /lib64/libfreebl3.so dataskq 11477 root mem REG 259,4 277704 7602374 /lib64/libgssapi_krb5.so.2.2 dataskq 11477 root mem REG 259,4 142640 7602213 /lib64/libpthread-2.12.so dataskq 11477 root mem REG 259,4 1921216 7602189 /lib64/libc-2.12.so dataskq 11477 root mem REG 259,4 90880 7602578 /lib64/libgcc_s-4.4.7-20120601.so.1 dataskq 11477 root mem REG 259,4 596264 7602197 /lib64/libm-2.12.so dataskq 11477 root mem REG 259,4 987096 21103632 /usr/lib64/libstdc++.so.6.0.13 dataskq 11477 root mem REG 259,4 110960 7602215 /lib64/libresolv-2.12.so dataskq 11477 root mem REG 259,4 14664 7602247 /lib64/libcom_err.so.2.1 dataskq 11477 root mem REG 259,4 174840 7602378 /lib64/libk5crypto.so.3.1 dataskq 11477 root mem REG 259,4 941920 7602380 /lib64/libkrb5.so.3.3 dataskq 11477 root mem REG 259,4 19536 7602195 /lib64/libdl-2.12.so dataskq 11477 root mem REG 259,4 98661 21106191 /usr/local/lib/libz.so.1.2.3 dataskq 11477 root mem REG 259,4 1950976 21104485 /usr/lib64/libcrypto.so.1.0.1e dataskq 11477 root mem REG 259,4 40400 7602193 /lib64/libcrypt-2.12.so dataskq 11477 root mem REG 259,4 437016 21104487 /usr/lib64/libssl.so.1.0.1e dataskq 11477 root mem REG 259,4 154520 7602580 /lib64/ld-2.12.so dataskq 11477 root 0r REG 259,4 2795 8792180 /home/tmp/quota-dump (deleted) dataskq 11477 root 1r REG 259,4 710688943 21633577 /usr/local/directadmin/data/users/detvl/bandwidth.tally
And type the following command
# tail -n 10 /var/log/directadmin/errortaskq.log
Output
==> /var/log/directadmin/errortaskq.log <== 2014:08:28-09:07:38: Dataskq USR1 signal: Currently processing: Tally::get_bandwidth_breakdown(..., 0) for detvl : done reading, begin parsing 2014:08:28-09:07:38: Dataskq USR1 signal: Currently processing: Tally::get_bandwidth_breakdown(..., 0) for detvl : done reading, begin parsing 2014:08:28-09:07:38: Dataskq USR1 signal: Currently processing: Tally::get_bandwidth_breakdown(..., 0) for detvl : done reading, begin parsing 2014:08:28-09:07:38: Dataskq USR1 signal: Currently processing: Tally::get_bandwidth_breakdown(..., 0) for detvl : done reading, begin parsing 2014:08:28-09:07:38: Dataskq USR1 signal: Currently processing: Tally::get_bandwidth_breakdown(..., 0) for detvl : done reading, begin parsing 2014:08:28-09:07:38: Dataskq USR1 signal: Currently processing: Tally::get_bandwidth_breakdown(..., 0) for detvl : done reading, begin parsing 2014:08:28-09:07:38: Dataskq USR1 signal: Currently processing: Tally::get_bandwidth_breakdown(..., 0) for detvl : done reading, begin parsing 2014:08:28-09:07:38: Dataskq USR1 signal: Currently processing: Tally::get_bandwidth_breakdown(..., 0) for detvl : done reading, begin parsing 2014:08:28-09:07:38: Dataskq USR1 signal: Currently processing: Tally::get_bandwidth_breakdown(..., 0) for detvl : done reading, begin parsing 2014:08:28-09:07:38: Dataskq USR1 signal: Currently processing: Tally::get_bandwidth_breakdown(..., 0) for detvl : done reading, begin parsing
Maybe, I get problem with parser processing log. Next, I check size of "/usr/local/directadmin/data/users/detvl/bandwidth.tally". Type the following command
# du -sh /usr/local/directadmin/data/users/detvl/bandwidth.tally
Output
678M /usr/local/directadmin/data/users/detvl/bandwidth.tally
It's very big. To solve the problem, The first I kill all dataskq processes with bellow command
# killall -USR1 dataskq
Or run script
#!/bin/bash PIDS=`ps aux | grep dataskq | awk '{print $2}'` for P in $PIDS do kill -9 $P done
The second, I truncate log data of file bandwidth.tally, type the following command
# echo "" > /usr/local/directadmin/data/users/detvl/bandwidth.tally
Finally, I set priority to slow down the dataskq, type the following command
# vi /etc/cron.d/directadmin_cron # Replace with * * * * * root nice -n 19 /usr/local/directadmin/dataskq