High Performance Computing 2013Kyiv, October 7-11, 2013

Schedule

Agenda overview

Monday Tuesday Wednesday
Building 6, Hall SC, Exhibition SC, Hall Building 6, Hall SC, Exhibition SC, Hall Building 6, Hall SC, Exhibition SC, Hall
9:00 Registration       HPC Breakfast     HPC Breakfast
10:00 Opening Session Industrial Session   Parallel Algorithms GPU & FPGA Clouds & Grids
10:30 Plenary Session
11:00   Coffee Break   Coffee Break     Coffee Break
11:30 Plenary Session   Industrial Session   Parallel Algorithms GPU & FPGA   Clouds & Grids
13:00 Lunch Lunch Lunch
14:00 Architecture & System Software   HPC Applications Industrial Session   ANSYS Workshop GPU & FPGA   Clouds & Grids
15:00 OpenCL BoF
16:00   Coffee Break   Excursion   Coffee Break
16:30 Architecture & System Software   HPC Applications   Closing Session
18:00 Welcome Party @ Alma Mater Cafe

SC – Supercomputing Center.

Monday

09:00 – 10:00

10:00 – 10:30

.

10:30 – 13:00

10:30

HPC goes Internet: Challenges and Solutions for Real-Time Interactive Online Applications

Real-Time Interactive Online Applications (ROIA) include emerging and future Internet applications, for example massively multiplayer online games and high-performance, simulation-based e-learning and training. They pose extremely high requirements on real-time and interactive scalability, i.e. maintaining the real-time responsiveness and other QoS features by dynamically adapting computer and network resources for changing number of users.

We will analyse the special challenges of ROIA applications and describe how our Real-Time Framework (RTF) employs parallelism, distribution, and novel network technologies like SDN for meeting these challenges.

11:00 – 11:30

11:30

Large-Scale Time-dependent Ginzburg-Landau simulations on GPUs

The advent of large-scale simulations on GPUs, made it possible to solve many classes of partial differential equations very efficiently.
In this talk I will present some new simulation results of the time-dependent Ginzburg-Landau equation (TDGL) for mesoscopic superconductors. like narrow superconducting strips and nano-patterned superconductors. I will give a short introduction of the physics background and our motivation for the research and present some details on the implementation of the GPU solver.

12:00

Intel tools for HPC developers deliver top application performance while minimizing development, tuning and testing time and effort.

The latest version of the IntelR Parallel Studio XE provides C, C++ and Fortran developers cutting edge tools to create parallel software running on today and tomorrow's IA-compatible processors and coprocessors.

Intel compilers include advanced optimization and multithreading capabilities along with excellent compatibility with popular development environments.

Also there are several highly optimized performance libraries (Threading Building Blocks (TBB), Math Kernel Library (MKL) и Integrated Performance Primitives (IPP)) and analysis tools for creating fast reliable multithreaded applications. In addition to the compilers and libraries, there a some other useful tools, like an advanced threading assistant,  easy to use memory and threading error detector for serial and parallel applications, and of course, well known premier performance and thread profiler.

We will briefly discuss new features of these tools and focus on the compiler key features which help to parallelize your applications (e.g., vectorization).

12:30

Designing balanced cluster under limited budget

Побудова збалансованого кластера в рамках обмеженого бюджету

13:00 – 14:00

14:00 – 18:00

14:00

From digital analogs through recursive machines to quantum computers

В докладе рассматривается эволюция вычислительной техники от цифровых аналогов через не-фон-Неймановские машины к квантовым компьютерам, которые тоже являются цифровыми аналогами. В 60-е годы цифровые аналоги успешно разрабатывались в Институте электромеханики АН СССР в Ленинграде. Важным этапом в разработке неклассических многопроцессорных машин высокой производительности и надежности была разработка рекурсивных машин, которая осуществлялась в Институте кибернетики АН УССР под руководством В.М.Глушкова и в Ленинградском институте авиационного приборостроения. Общий подход к синтезу осуществляется через лингво-комбинаторное моделирование со структурированной неопределенностью.

14:20

Virtual machine testing for computing cluster configurations optimization

This paper contains configurations test results of dynamically reconfigurable cluster computing system (DRCCS) (the Microsoft Windows HPC Server 2008 R2 platform) with virtual machine (VM) nodes (the Microsoft Hyper-V platform) for both static and dynamic random access memory (RAM) settings cases. Cluster nodes central processing unit (CPU) load forecasting method investigation results are shown for different RAM types. Results of the experiments represent virtual memory operation differences for different VM RAM type settings. CPU load behavior predictability deterioration takes place for computing jobs which processes don't use whole allocated RAM.

14:40

Statistical analysis and study of the influence of input queue processing on functioning of distributed computing systems

Рассмотрена задача влияния статистических характеристик и методов планирования входных потоков заданий на эффективность функционирования распределенных вычислительных систем. Проведен статистический анализ входных заданий для различных периодов (интервалов) их поступления в вычислительную систему и последующую обработку для определения закономерностей и законов распределения их характеристик. Исследованы законы распределения входных потоков заданий по их длительности и интенсивности. Приведен анализ результатов экспериментов для различных характеристик функционирования и структур распределенных вычислительных систем и методов планирования заданий, полученных средствами пакета GridSim. Обосновано использование полученных результатов на практике для повышения производительности работы распределенных вычислительных систем.

15:00

Offload acceleration of scientific calculations within .NET assemblies

A solution to improve portability of automatically parallelized code and extend applicability of polyhedron model is reached via on-the-fly transformations within JIT mechanisms of virtual execution system. An architecture of parallelizing optimizer module for ILDJIT is discussed. Target parallel architectures are many-integrated core (Intel MIC) and common x86 multicore processors. The LU-decomposition example written in C\# illustrates significant speedup over traditional JIT-execution after being parallelized by execution system in runtime.

15:20

Dual-layer hardware and software management in cluster systems

In this paper, we describe the software architecture of a cluster system, which allows to reassign any particular system task to another hardware without relying on virtualization. We also show the support for services virtualization along with non-virtualized computational nodes.

15:40

Optimization of data center performance in Lashkaryov Instutute of Semiconductor Physics NAS of Ukraine for the physicochemical diagnostics of materials properties

In connection with enormous adoption of highly efficient calculations with the use of multiprocessor and multinode computational clusters in applied physical-chemical material analysis the necessity of development the special-purpose datawarehousing and data processing centers become of decisive importance. Such centers must be round-the-clock user-accessible which implies adherence of standards concerning safety of data storage and durability. The presented report is devoted to the evaluation of data center capabilities and to performed optimization of computational cluster productivity in ISP NAS of Ukraine.

16:00 – 16:30

16:30

Model Synthesis and Model-Oriented Programming

The work is devoted to the description and simulation of complex systems, about which it is well known of what components they are made, what those components are able to do, what rules of interaction they obey. The challenging problem of modeling is to reproduce the behavior and to evaluate the capabilities of such a system as a whole. A new approach to the design and implementation of computer simulation models of complex multicomponent systems is introduced. It differs from the object-oriented approach.

16:50

The algorithms for parallel information processing in many-stage commutation systems for high performance computing systems and the switching hardware

The problem of increasing the throughput of data transmission networks and the output of multiprocessor systems, consisting of many-stage commutation systems, intended for a large number of the inputs, is solved. Two parallel processing algorithms are offered. One of them is parallel search and the other is parallel identification. Traffic detentions are much less in the many-stage system with parallel processing than in the many-stage systems, using the consistent search of communication channels.

14:00 – 18:00

14:00

The use of parallel computing in the computed tomography problem

This paper considers the problem of computed tomography, which consists in the reconstruction of the three-dimensional structure of the object according to data of radiographic irradiation. The solution is presented as a set of two-dimensional images. Reconstruction of the object in a separated section is made by using convolution back projection algorithm. Parallelization of the computation process is carried out over a set of cross-sections. Software implementation of the algorithm is carried out with using MPI technology.

14:20

HPC systems for computer-aided medical images analysis

Progress in the development of bioinformatics and mathematical methods in biomedicine, as well as the development of computer and telecommunications systems and networks determines the look of the present and future of medical technology and of medicine in general.

14:40

On the possibilities of the interactive mode for the processing of medical data in the Grid-system for storing medical images

Institute for Scintillation Materials, in collaboration with other institutions NAS and AMS, has been performed pilot project in order to create a Grid storage of medical images. This system is a step to the field of electronic storage of medical data. Extensive data quantity, its structure and distribution, the reliability and availability requirements is a challenge that GRID technologies can accept. The standard of medical information record is DICOM. This system can be used not only for the storage and processing of images, but also for the solution of statistical and epidemiological problems. Particular attention is paid to the possibilities of interactive processing of medical data accumulated in the system.

15:00

Mathematical Modeling of RF Plasma Streams at Low Pressure

This article describes mathematical model of RF plasma's stream at a pressure of 13.3-133 Pa in the transition regime at Knudsen 8•10−3 ≤ Kn ≤ 7•10−2 for the neutral gas. The model based on a statistical approach of neutral component and on continuum model for the electronic and ionic components. The calculations were performed using the method of MPI for the given directions. Results of RF plasma flows simulation are presented for unperturbed flow and with presence of specimen. Simulation has an agreement with the available experimental data.

15:20

Parallelism in Magnetic Force Microscopy Studies Algorithm and Optimization

The algorithm of the calculation of the dipole-dipole interaction was formalized and was implemented in C++ code using MPI library. The program package is fully parallelized and has high optimization for computational architecture. Our parallel program allows obtain Magnetic-Force-Microscopy (MFM) images and to establish the magnetic configuration which one correspond given experimental MFM picture.

15:40

TANDEM Program for Parallel Computing of Coupled Neutron-Physical and Thermal-Hydraulic Characteristics of Reactor Cores

Full-scale physical computing of reactor lifetime includes three types of computations: computing of neutron-physical parameters of the core, computing how nuclide composition of fuel changes with time due to fuel burnup, and thermal-hydraulic computations. For solving the problems of ionizing radiation transfer using the Monte-Carlo method, RFNC-VNIITF has developed the constantly-improving PRIZMA program. The RISK module is used to calculate how nuclide composition of reactor fuel changes with time due to this fuel burnup.

16:00 – 16:30

16:30

HPC Algorithms for Calculating Properties of Magnetic Nanostructures

The critical phenomena on cobalt monolayer and submonolayer films and the properties of nanodots were studied by means of computer simulation and using HPC-algorithms. The proposed approach on the base of data of scanning tunneling microscopy gives the possibility estimate of the critical concentration needed for concentration transition to ferromagnetic state. An assumption about the presence of a critical switching field allowed the simulated hysteresis loops for the given 1.5, 2.0, 2.5, and 3.0 ML cobalt samples in frame of the Ising model, which have qualitative agreement with magnetometric data. Author’s developed approaches to simulation of magnetic phenomena in nanostructures require using of supercomputers.

16:50

Superlinear Speedup of Parallel Calculation of Finite Number Ising Spins Partition Function

The high-performance parallel algorithm for rigorous calculation of partition function of lattice system with finite number Ising spins was developed. The parallel calculations run by C++ code with using of Message Passing Interface. The superlinear speedup was obtained for given executing code. The reasons of experimentally observed Amdahl’s law violation are considered.

17:10

Multithreaded version of AutoDock 4.2 suitable for massive virtual screening of potential biologically active compounds (enzyme inhibitors)

A modified version of a well-known docking package AutoDock 4.2 was created. It is specially designed for virtual screening of a large number of potential inhibitors. AutoDock uses a set of pre-calculated grids and other parameters describing intermolecular interaction which are the same for the same target molecule. Multithreading allows more efficient use of memory. Alternatively it can help to increase precision. The process becomes much more efficient and more suitable for massive screening. Besides that a new search algorithm is proposed to handle many ligands which are too flexible to be thoroughly investigated by AutoDock’s stochastic search engine in any reasonably time.

18:00 – 21:00

Tuesday

09:00 – 10:00

10:00 – 13:00

10:00

Innovative technologies of FlowVision software for solving ship hull hydrodynamics problems

В докладе обсуждаются инновационные технологии программного комплекса FlowVisionHPC, основанного на конечно-объемном методе решения уравнений гидродинамики и использующего прямоугольную адаптивную сетку с локальным измельчением, применительно к задачам гидродинамики корпуса судна.

10:30

Developing an Efficient IT-Infrastructure for High-Performance Computing in ANSYS

The presentation will analyze hardware and software characteristics of the most common HPC platforms for ANSYS simulation software. Scalability of structural, fluid dynamics and electromagnetic problems will be shown for a variety of platforms with or without using GPU. The usability of Remote Solve Manager will also be demonstrated. And finally, the presentation will highlight ANSYS HPC licensing policy.

11:00 – 11:30

11:30

Using a supercomputer and IOSO optimization software to increase efficiency for complex technical systems

Представлены практические примеры применения суперкомпьютеров для решения задач автоматического проектирования сложных технических систем на основе моделирования и оптимизации проектных параметров.

Описываются особенности программной платформы IOSO для управления расчетами и возможности технологии оптимизации, позволяющей решать задач оптимизации большой размерности, в однокритериальной и многокри-териальной постановках, при небольшом количестве прямых обращений к математической модели исследуемого объекта.

11:50

Study of towing resistance and shape optimization of vessel with a large block coefficient

В докладе изложены результаты численного исследования обтекания корпуса транспортного судна с большой полнотой обводов, выполненного в программном комплексе FlowVision версии 3.08. В рамках исследования проведено моделирование буксировки корпусов в широком диапазоне чисел Фруда, рассмотрены вопросы влияния разрешения расчетной сетки и типа пристеночной функции граничного условия на корпусе на силу буксировочного сопротивления, выполнено сопоставление полученных результатов с экспериментальными данными. Представлены некоторые результаты применения оригинального численного подхода к оптимизации буксировочного сопротивления объекта исследования.

12:10

High-perfomance computations in problems of simulation and optimization of turbine hydrodynamics

Considered are the problems of speeding up the solution of direct and inverse problems in 3D turbine hydrodynamic [Cherny S. G., Chirkov D. V., Lapin V. N., Skorospelov V. A. and Sharov S. V. Numerical simulation of fluid flows in turbomachines, 2006, Nauka, Novosibirsk (in Russian)] using both upgrading numerical methods of solving Euler or Reynolds-averaged Navier–Stokes equations of incompressible fluid and parallel computations.

Parallelization of direct numerical methods is done by dividing computational domain blocks among the cluster's cores. Communication between cores is runned using MPI standard. Comparative analysis of different ways to divide computatonal domain into block is given. Speedup results achieved on different clusters in problems of simulating unsteady flow in real hydroturbine are shown.

12:30

Multipurpose software package LOGOS to solve CFD and heat-and-mass transport problems on supercomputers

Designing of high-tech products is unfeasible without the use of numerical methods for the simulation of physical processes in a complex geometric configuration. Expertise in solving CFD problems has been acquired and understanding of the nature of many physical processes has been achieved by the present time and, therefore, with the currently available numerical methods, growing computer power, lowering price of computers, and available software it becomes possible to introduce in practice multipurpose engineering software packages. These software packages include catalogs of mathematical models of physical processes, finite difference schemes oriented to simulations with a variety of discrete grid models, as well as computational modules for algebraic equation systems.

10:00 – 13:00

10:00

Parallel numerical simulation of wave propagation in 3D elastic medium with application of the Laguerre transform

In this paper, we apply an approach for numerical simulation of elastic waves in heterogeneous medium, based on the implementation of integral Laguerre transform with subsequent domain decomposition. Following the Laguerre transform, we obtain a system with strictly negative definite elliptic operator, which doesn’t depend on separation parameter. Therefore, parallel calculations can be organized by means of the additive Schwarz method and systems of linear algebraic equations in each subdomain can be solved by means of LU factorization. In the paper we study this approach in 3D case.

10:20

Novel algorithm for finding minimal convex hulls

Запропоновано новітній алгоритм формування мінімальних випуклих оболонок графа з використанням графічних прискорювачів. Висока швидкодія та лінійна складність такого методу досягається за рахунок розподілення вершин графа на окремі блоки та здійснення їх фільтрації. Керування обчислювальним процесом відбувається за допомогою допоміжних матриць. Виконано низку експериментальних досліджень алгоритму та показано придатність його застосування при обробці оболонок для задач великої розмірності. Встановлено, що швидкість нового методу є в 10 – 20 разів вищою порівняно з використанням функцій професійного математичного пакету Wolfram Mathematica.

10:40

The algorithm applies parallel computing of Schweitzer’s method for the main scheme optimizing DTIP in strategic planning

A problem of improving the efficiency of solving the problems of strategic planning on the basis of a balanced scorecard in multi-level hierarchical systems of organizational control is considered. A modified algorithm of Schweitzer's method of the main optimization scheme for discrete technology and information processes with parallelization of computations is proposed. Using this algorithm will improve the computational efficiency, and therefore allows using a real dimension large-scale model to solve the problems of strategic planning. Proposed approach and the algorithm can be applied to other types of DTIP

11:00 – 11:30

11:30

Optimization of processing large amounts of seismic data in case of 3D migration of duplex waves

В работе рассмотрен подход к оптимизации обработки больших объемов данных, на процессорах, состоящих из большого количества ядер. На примере обработки сейсмических данных программой 3D миграции дуплексных волн показано управление самой программой использования вычислительных ресурсов и оперативной памяти для улучшения качества обработки данных и во многих случаях уменьшения времени работы программы.

11:50

Workshop. Intel tools for HPC developers deliver top application performance while minimizing development, tuning and testing time and effort

The latest version of the Intel® Parallel Studio XE provides C, C++ and Fortran developers cutting edge tools to create parallel software running on today and tomorrow’s IA-compatible processors and coprocessors.

Intel compilers include advanced optimization and multithreading capabilities along with excellent compatibility with popular development environments.

Also there are several highly optimized performance libraries (Threading Building Blocks (TBB), Math Kernel Library (MKL)иIntegrated Performance Primitives (IPP)) and analysis tools for creating fast reliable multithreaded applications.

In addition to the compilers and libraries, there a some other useful tools, like an advanced threading assistant,  easy to use memory and threading error detector for serial and parallel applications, and of course, well known premier performance and thread profiler.

We will briefly discuss new features of these tools and focus on the compiler key features which help to parallelize your applications (e.g., vectorization).

Gravity Waves application optimization with Intel® Parallel Studio XE 2013

This demo shows the usage of Intel® Parallel Studio XE 2013 for effective parallelization and optimization of the Gravity Waves application, which calculates and visualizes internal gravity waves in the Ocean. All phases of the parallel application development are covered, including the design, build, debug, verify and tuning.

12:50

Methods of interpolation and mean square approximation in cluster libraries

В работе приводится тезисное изложение библиотек интерполяции и среднеквадратичной аппроксимации как развитие работ по созданию библиотек в составе базового прикладного программного обеспечения кластера.

13:00 – 14:00

14:00 – 15:30

14:00

Large Scale Computer Modeling of Aerothermodynamics of Hypersonic Vehicle Composite Constructions

The computational method for solving of couple air gas dynamics and internal heat transfer problems in constructions of hypersonic vehicles is proposed. The method is based on iterative solution of three types of detached problems: gas dynamics problem for ideal gas, viscous heat-conducting problem with full Navier-Stokes equations for three-dimensional boundary-layer and heat transfer equation for vehicle shell. Computer-aided software package SIGMA that implements obtained algorithms and capable to perform calculations on high-performance computers was developed. Results of modeling of flow over a hypersonic vehicle are presented and temperature fields for adiabatic wall and with heat transfer between gas and wall are compared which shows the importance inclusion of heat transfer computations in process of design of vehicle's head shield.

14:20

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU

Multiphase flows are widely used in many practical applications in industry, such as oil industry, chemical and thermal engineering, bioengineering and medicine. Especially flows in tubes with granular layer. Multiphase flows in inclined tubes are poorly studied. Numerical studying of multiphase flows in inclined tubes was performed. Cases of clear tube and tube with granular layer were examined. Simulation model was based on lattice Boltzmann method. Parallel algorithm was programmed in CUDA C. For numerical simulations graphical processor nVidia Tesla C2075 was used. Bubble flow in inclined tubes with different inclination angles and diameters of beads were studied. Simulation results are in agreement with the experimental studies. Flow pattern of air bubble was examined.

14:40

Computer modeling of complex heat exchange in the package FreeFem++

This paper considers the problem of a complex heat exchange in three-dimensional scattering media with reflecting boundaries. An iterative algorithm based on the use of finite elements method is proposed. Software implementation of the algorithm is carried out in the package FreeFem++.

15:00

Parallel computations of stratified flows using continuum mechanics web-laboratory

Представлен авторский опыт работы в Web-лаборатории механики сплошных сред, созданной на базе технологической платформы UniHUB, по численному моделированию и высокопроизводительным расчетам задач механики неоднородных жидкостей на основе свободных прикладных пакетов OpenFOAM, SALOME и ParaView. Обсуждаются вопросы построения высокоразрешающих расчетных сеток, использования встроенных и расширенных утилит и функций пакета OpenFOAM с целью постановки сложных граничных условий и разработки собственных решателей с применением объектно-ориентированного языка программирования С++.

14:00 – 15:00

14:00

.

14:20

.

14:40

.

15:00 – 16:00

Heterogeneous programming methodology based on OpenCL framework

High-performance computing systems containing hybrid nodes with microprocessors like GPGPU or Intel MIC are widely spread nowadays. Many (though not all) computational algorithms can be effectively ported to hybrid systems. We have developed a multiparameter model of a hybrid computing system and a general method of evaluation the performance of algorithms on such systems in terms of the model parameters. In addition, we have implemented a system-level service, an Infrastructure of Heterogeneous Computing, which allows to program a whole distributed system within one programming model. In our work we discuss some applications of the Infrastructure to a wide class of tasks (HPL, NAS PB {FFT, MG, CG}, CFD), and provide a brief comparison with other heterogeneous approaches.

16:00 – 20:00

Excursion

2-hour excursion to Polytechnic and Aerospace museums of NTUU "KPI". Walking excursion around Kyiv center. Explore famous Kyiv churches, Dnieper river, Kreschatyk street and try national Ukrainian food!

Wednesday

09:00 – 09:20

HPC Breakfast

09:20 – 10:00

Chair(s):
09:20

Концепция управления «HPC-центр как ресурс»:
-  Современные тенденции развития управляющего ПО в HPC.
-  Концепция Resource Manager.
-  Примеры построения Resource Manager и политик планирования ресурсов с  помощью Moab HPC Suite.

10:00 – 13:00

10:00

QosCosGrid end-user tools in PL-Grid - good practices and examples

QosCosGrid is a middleware stack being developed at Poznan Supercomputing and Networking Center, which has been initially deployed within the PL-Grid e-infrastructure. After its successful adoption within the Polish research communities, we started collaboration at European level. The efforts resulted in signing the Memorandum of Understanding with EGI (European Grid Infrastructure) in November 2012 and with BCC (Basic Coordination Centre) of Ukrainian National Grid on August 2013. As far as technological aspects are concerned, the QosCosGrid stack officially  entered the UMD (Unified Release Distribution) pipeline in September 2013. In this paper we provide a reader with a short introduction to the QosCosGrid architecture and its core components with a strong emphasis on the available end-user tools: CG-Icon - a desktop GUI application that integrates with the operating system and QCG-Simple - a set of command-line tools which breaks the barrier that existing batch systems’ users are facing when migrating to the Grid. We also share our experience related to end user support and discuss plans for further development works.

10:20

.

10:40

Cloud and Grid Technology Based Educational and Research Computing System

The development of computer technology has significantly expanded the possibilities for its use in research and learning process. Modern society requires revitalize the educational process, prepare a professional who not only possesses some knowledge, but still capable for continuous self-improvement, self, self-realization, which requires the current trends of the labor market. Therefore, it was necessary to develop training facilities with appropriate methodological support for the training of high quality, who are able to maintain the equipment.
The aim is to develop teaching and research at the university complex grid cluster pedagogy and methodological support, able to be used in teaching and research.

11:00 – 11:30

11:30

Complex Workflow Management and Integration of Distributed Computing Resources by Science Gateway Portal for Molecular Dynamics Simulations in Materials Science

The IMP Science Gateway Portal'' (http://scigate.imp.kiev.ua) for complex workflow management and integration of distributed computing resources (like clusters, service grids, desktop grids, clouds) is presented. It is created on the basis of WS-PGRADE and gUSE technologies, where WS-PGRADE is designed for science workflow operation and gUSE - for smooth integration of available resources for parallel and distributed computing in various heterogeneous distributed computing infrastructures (DCI). The typical scientific workflow with possible scenarios of its preparation and usage is considered. Several typical science applications (scientific workflows) are considered for molecular dynamics (MD) simulations of complex behavior of various nanostructures (nanoindentation of graphene layers, defect system relaxation in metal nanocrystals, thermal stability of boron nitride nanotubes, etc.). The advantages and drawbacks of the solution are shortly analyzed in the context of its practical applications for MD simulations in materials science, physics and nanotechnologies with available heterogeneous DCIs.

11:50

The system of dynamic software security and reliability analysis with high-performance

The article describes dynamic analysis techniques and cloud-based platform for software security and reliability testing. In the article the author contributes to dynamic execution analysis techniques. In the first part of the article the authors describe the technique of dynamic binary analysis which is referred to as «fuzzing». The authors show basic architecture of the tool for security and reliability testing of application and vulnerabilities detection. Due to the use of the dynamic binary instrumentation this technique’s implementation is much faster than ones’ applied previously. The technique described in the article has been implemented as the software platform which includes: web-interface, virtual machine dispatcher, dynamic instrumentation library, debugger, special dll, protocol description and fuzzing manager tool. The results of vulnerabilities detection in the FTP - server confirmed the consistency of the technique.

12:10

TCP network game: conditions for evolutionary stable equilibrium

This paper deals with network congestion problems using evolutionary games approach. Network connections choose between aggressive and peaceful strategies trying to maximize their payoffs. We present conditions of equilibrium existence depending of loss sensitivity parameter. This result is illustrated by simulation of game dynamic.

12:30

Service-oriented computing (SOC) in Engineering Design

Service-oriented computing (SOC) is the new cross-disciplinary paradigm for distributed computing that is changing the way software applications are designed, architected, delivered and consumed. Services are autonomous, platform-independent computational entities that can be used in a platform independent way. Services can be described, published, discovered, and dynamically assembled for developing massively distributed, interoperable, evolvable systems. This paper provides a roadmap of development of the Engineering Design Platform, based on SOC and intended, in particular, for modeling and optimization of Nonlinear Dynamic Systems, based on components of different physical nature and being widely spread in different scientific and engineering fields.

10:00 – 13:00

10:00

.

10:30

Up to 700k GPU cores, Kepler, and the Exascale future for simulations of star clusters around black holes

We present benchmarks on high precision direct astrophysical N-body simulations using up to several 100k GPU cores; their soft and strong scaling behaves very well at that scale and allows further increase of the core number in the future path to Exascale computing. Our simulations use large GPU clusters both in China (Chinese Academy of Sciences) as well as in Germany (Judge/Milkyway cluster at FZ Julich). Also we present first results on the performance gain by the new Kepler K20 GPU technology, which we have tested in two small experimental systems, and which also runs in the titan supercomputer in the United States, currently the fastest computer in the world. Our high resolution astrophysical N-body simulations are used for simulations of star clusters and galactic nuclei with central black holes. Some key issues in theoretical physics and astrophysics are addressed with them, such as galaxy formation and evolution, massive black hole formation, gravitational wave emission. The models have to cover thousands or more orbital time scales for the order of several million bodies. The total numerical effort is comparable if not higher than for the more widely known cosmological N-body simulations. Due to a complex structure in time (hierarchical blocked time steps) our codes are not considered "brute force''.

11:00 – 11:30

11:30

Modeling system for GPU parallel tasks performance simulation

A flexible and extensible simulation tool architecture, called gpusim, is proposed for heterogeneous grid systems with graphics accelerators. The tool is based on open source Java framework GridSim. Checking for models adequacy and their initial investigation has been performed using known examples of parallel computation problems. The tool allows choosing the most optimal setting parameters in automatic mode and thus significantly reduces the time to obtain optimal implementation compared to the direct execution of the program.

11:50

QCDGPU: open-source package for Monte Carlo lattice simulations on OpenCL-compatible multi-GPU systems

The multi-GPU open-source package QCDGPU for lattice Monte Carlo simulations of pure SU(N) gluodynamics in external magnetic field at finite temperature and O(N) model is developed. The code is implemented in OpenCL, tested on AMD and NVIDIA GPUs, AMD and Intel CPUs and may run on other OpenCL-compatible devices. The package contains minimal external library dependencies and is OS platform-independent. It is optimized for heterogeneous computing due to the possibility of dividing the lattice into non-equivalent parts to hide the difference in performances of the devices used. QCDGPU has client-server part for distributed simulations. The package is designed to produce lattice gauge configurations as well as to analyze previously generated ones. QCDGPU may be executed in fault-tolerant mode. Monte Carlo procedure core is based on PRNGCL library for pseudo-random numbers generation on OpenCL-compatible devices, which contains several most popular pseudo-random number generators.

12:10

Distribution of particles on spherical surfaces: GPU numerical study

Studying statistical properties of physical systems requires numerous routine computations on different input data and in the best way suits the GPGPU computational paradigm. In this vein, we present a GPU parallelization of the computational algorithm for
solving the Thomson problem, i.~e. the distribution of Coulomb charges on the surface of a sphere. Use of GPU gives more than 100 times computational speed-up, thus allowing us to routinely generate
statistically independent numerical solutions to the Thomson problem {\it en masse}. It was a common perception that geometrical invariants in the Thomson problem did not have any correlation with
the energy. However, having performed about $5\times10^5$ runs, we show that some geometrical invariants (e.~g. a sum of the intercharge distances or an entropic sum of the partial energies) do statistically correlate with the energy, thus revealing that there is some hidden order in the geometrical patterns of charges distributed on the surface of a sphere.

12:30

Research on performance dependence of cluster computing system based on GPU accelerators on architecture and number of cluster nodes

This paper presents results of computing experiments for verifying correctness of the choice and clarification of technical solutions for hybrid computing system based on GPU accelerators. Explained results of testing performance of hybrid computing system consisting of two and three nodes in Linpack benchmark. It was changing the number of GPU accelerators Nvidia Tesla M2090 for each node during testing. Also defined optimal values of RAM for six variants of hybrid computing system.

13:00 – 14:00

14:00 – 15:30

14:00

Error-Free Computation of Inverse Matrices in FPGA

An algorithm for computing the determinant of integer matrices based on the Givens method using rational fraction numbers is considered. The algorithm is implemented in the processor array in FPGA to calculate the inverse matrices. Due to the pipelining, and the synthesis using the resynchronization approach, the processor has a high clock frequency for the integer data with the bit width up to hundreds of bits. The processor can be used in adaptive filtering, pattern recognition, computational geometry, cryptoanalysis, etc.

14:20

Computing Pythagorean Triples in FPGA

A new method for calculating the Pythagorean triples is proposed, which is based on the two-step algorithm. The module, which implements this algorithm is configured in FPGA with small hardware volume, and can calculate the triple for a single clock cycle. Comparing to the CORDIC method this one has less hardware volume, higher speed, and provides the exact values of the sine and cosine results.

14:40

Accelerating direct SLAE solvers using GPU

В данной работе представлен способ адаптации прямого решателя СЛАУ для вычислительных систем, использующих графические ускорители (GPU). Благодаря перенаправлению наиболее ресурсоемких операций на GPU удалось значительно сократить время работы решателя, при этом серьезных изменений кода и алгоритмов не потребовалось. Описан опыт пошагового повышения быстродействия, перечислены проблемы, возникшие при работе с графическими процессорами, и рассмотрены варианты их решения. Также было проанализировано влияние различных факторов на эффективность решателя. Приведены результаты тестирования и намечены направления дальнейшей работы.

15:00

On solving SLAE with belt matrices on hybrid computers

В даній статті розглядується реалізація алгоритму розв'язування систем лінійних алгебраїчних рівнянь зі стрічковими симетричними додатно визначеними матрицями на комп'ютерах з графічними прискорючувами. Подано результати апробації алгоритму на багатоядерному комп'ютері з графічними прискорювачами Інпарком.

15:30 – 16:00

16:00 – 17:00

Closing Session

@ Building 6, Conference Hall

Wrap-up and short concert by NTUU "KPI" Student Bandourist Chapel.

Poster Session

Monday 10:00 – Wednesday 18:00

Application of the gLite Workflow Management System for the ARC Infrastructure

ARC and gLite are two main European suppliers of middleware and distributed computing and EMI Project data management services. ARC Project has a broker imbedded into the client and can operate only with its own infrastructure or with the gLite computing nodes (CREAM CE) directly. At the same time, gLite has a centralized workflow management system WMS. Application of WMS within both infrastructures accelerates data processing and improves load balancing that meets the needs of data analysis problems of LHC Computing Grid. Setting up ARC to interact with WMS would improve the interaction of EMI components (ARC UI and gLite WMS) significantly. Research conducted in this paper concerns with problems of ARC client and gLite WMS coordination. With this respect, Grid middleware, ARC and gLite, have been studied. Possible ways to accelerate data processing and to improve load balancing have been suggested. ARC modifications to interact with the gLite WMS have been implemented. Developed infrastructure modifications made it possible to use WMS for jobs, which are allocated onto resources within ARC.

Robust Network Design Problems

The robust network design problem is to find a minimum cost connected network that contains all given terminal nodes after deleting the set of edges of any subgraph isomorphic to another given graph. We give some results on the existence of a solution of the problem pointed and present computational results on some interesting cases of isomorphic graph in the view of practice.

Software solution for a structural monitoring system with fiber Bragg grating sensors

Problems of the design of structural monitoring system for a constructions with composite materials are especially actual, because of their wide spectrum application in industry.
Composite materials are used, as an example, in spacecraft technologies and aeronautical engineering.
We present the software solution for computer appliance of structural monitoring system for composite materials based on fiber Bragg grating sensors.
There are two main scope of application for the system.
The first is preliminary tests system for novel composite materials in laboratory.
The second one is real-time structural monitoring system with alert of an operator.

The CO2– radical in the structure of type A carbonatehydroxylapatite by computer modeling

The type A carbonatehydroxylapatite (CHAp) (with CO3-group or CO2–-radical replaced two hydroxyls in channel of hexagonal hydroxylapatite (HAp)) has been investigated by computer modeling in the GULP program using GRID-techniques. The 3x3x3 supercell with overall composition [Ca270–x□x]270[PO4]162[OH52–y(CO3)1□y]54 (0.16 wt % CO2) at x and y = 0 or 1 has been considered. The web-sites of uagrid.org.ua and grid.inpracom.kiev.ua were used for calculations, which were executed in the "GEOPARD" virtual organization. It is established that in the most stable structures C lies in the channel at z ≤ 0.5, H atoms of hydroxyls in channel fragments near structural defects direct in one side. Carbonate oxygens occupy sites close to such in bioapatite, accommodation of radical oxygens is substantially differed. Hydroxyl bond orientations in unidirectional channel fragments lead to the strong hydrogen-bonded interactions between O and H of different hydroxyls. The radical atom О3 is located near the OH vacancy (□OH) unlike the axial О2, so that the approximately aligned O7–C–O5 structure is formed (the O-O axis is oriented at an angle of 5.5o with the c-axis). Radical forming results in appearance of additional structural defects (close-spaced □OH and □Са) and considerable displacements of ions in the nearest structure. The obtained data complement and are in agreement partly with experimental and theoretical investigations of type A CHAp. The obtained structure of CO2––radical corresponds partly to expected one of radicals responsible for axial peaks in EPR spectra of type A CHAp.

Parallel Computations Based on Time-Symmetry

The new type of parallelism, the parallel computation in opposite time directions is suggested. The mathematical models of reversible autonomous finite automaton and reversible linear finite state machine are introduced and considered. Practical implementation temporal models for error correction codes is shown.

Supercomputer modeling of open pit limits for ore deposits

This article provides a theoretical basis of the genetic algorithm for search open pits’ limits and covers the parallel implementation of the formulated algorithmic approach to the multiprocessor system

About possibility of clarification of gravity permanent value by calculation

Presently gravity permanent G is certain with exactness only 5 signs, that on 2-3 orders yields exactnesses of other fundamental physical constants are speeds c light in a vacuum and permanent Slat of h. However possibilities of increase of exactness of determination of G experimental a way in the conditions of Earth attained the technical limit, that requires the search of on principle new approaches. On the basis of the offered original approach the system of calculation dependences, effluent from fundamental physical constants with, is c, G, h, and also the Plank’s sizes of length of lp, time of tp and the masses of mp, allowing on 3 orders to specify the value of gravity known presently permanent G. The necessity of experimental determination of G is thus eliminated, only enough determinations c and h, and growth of their exactness automatically will result in growth of exactness of determination of size of G.

Multipath protected routing in distributed computer systems

In this paper, we give a new solution of a scientific problem, which consists in developing a new approach to the organization of data in distributed computing systems that enhance the security of data during transmission over wireless channels.

Simulation of Magnetic Hysteresis Properties of Artificial Ordered Arrays Nanoparticles

The high performance algorithm for supercomputer simulation of magnetic nanoparticles arrays was elaborated and implemented in parallel c++ code. Our program package allows a modeling of magnetic hysteresis phenomenon in the system of the magnetic spherical monoaxial one-domain dipole nanoparticles fixed in node of a square lattice with the period of 10-8 m in frame of Stoner-Wolfforth approach. During numerical experiments dependence of a type of curves of a hysteresis on number of particles and values of a critical field are investigated. It was showed, that the area of loops has proportional dependence from individual critical field of nanoparticles.

Parallel computing in modeling of process of seafloor acoustic sensing with side-scan sonar

В работе рассматривается проблемы построения гидролокационных изображений во флуктуирующем океане по измерениям, полученным с гидролокатора бокового обзора. Для кинетической модели, основанной на нестационарном интегро-дифференциальном уравнении переноса акустического излучения, исследуется задача определения отражающих свойств морского дна. В качестве зондирующего излучения используется сигнал с линейной частотной модуляцией.

Outer and inner approximation schemes for concave programming with parallelization in Python

We consider the problem of global minimization of a concave function over a bounded polyhedron. Two enumeration procedures are used. The first one is an outer approximation procedure which calculates all vertices of the polyhedron. By using this procedure we determine global minimum vertices. Inner approximation procedure is a dual reformulation of the outer approximation procedure and usually is applied for finding good local solution. From the computational point of view both procedures are very complicated. That is why we tried to elaborate parallel versions. The aim of the investigation was to test the suggested approach in Python on a notebook with several cores. The preliminary results are given.

Application of Newton Method in Symmetric Eigenvalue Problem

There exist many different problems such as problem of stability of a linear system or nonlinear minimization problem for which solving it is necessary to find eigenvectors and eigenvalues of a matrix. This paper describes a method to fulfill this need. This method allows one to solve symmetric eigenvalue problem by using Newton method of solving nonlinear equation system. We give a mathematical justification of the method for solving partial and full eigenproblems and present preliminary computational results performed in Python.

System – level diagnosis models for cloud computing and distributed information systems

Abstract. Cloud computing approach to creation and maintenance of distributed information systems is considered. Some important properties of three-level cloud services classes (IaaS, SaaS, PaaS) for self-organizing diagnosable systems are discussed. System-level diagnosis aims at the identification of faulty units in self diagnosable distributed systems for elimination, repairing or recovering of these units. Identification is carried out by means of system analysis and/or data mining of diagnosed system (e.g. for syndromes, models of inter unit testing, system topology). Complexity of such analysis in a great extent depends on the preliminary appreciation of diagnosis model and evaluation of diagnosis processes characteristics. The technique and measures for the structure decomposition and appreciation of system syndrome are proposed based on splitting the diagnostic graph on the set of ordinary structures (trees, stars, chains). Some productiveness evaluations for these structures are given. These evaluations enable the simplification and balancing of the diagnosis processes and data flows. New generalized class of diagnosis models is proposed which include known diagnosis models (e.g. PMC, BGM models).

Molecular dynamics modeling of irradiation damage in highly coordinated mineral structures

The radiation stability of zirconolite CaZrTi2O7, pyrochlore Gd2Zr2O7 and periclase MgO has been studied by computer simulations methods. Computer simulation of zircon ZrSiO4 also has been performed for comparison with these structures. These calculations were performed in grid- environment using «GEOPARD» virtual organization. The number of Frenkel pairs after propagation of the primary knock-оn atom of thorium with a kinetic energy of 20 keV (analogue of recoil atom arising due to the alpha decay of actinides) has been characterized by molecular dynamics method. Calculation of the effective charge of oxygen atoms has been performed using ab initio Hartree-Fock method and B3LYP hybrid functional (density functional theory). It is established that the radiation stability of these minerals depends significantly from two main factors: type of structure and the degree of chemical bonds covalency of the structures (or effective charge of oxygen atoms). The results of computer simulations show that structures with high bond iconicity and high coordination number of cations (periclase, pyrochlore) are characterized by high radiation resistance to amorphization.

Parallel computing in the exact method of solving minimax problems of source placing

Стаття присвячена використанню паралельних обчислень в точному методі розв’язання мінімаксної задачі розміщення джерел фізичного поля на фіксовані посадкові місця. Наведено математичну постановку задачі та обчислювальну схему розробленого метода. Оскільки схема запропонованого методу гілок та меж передбачає розбиття множини припустимих розв’язків на підмножини із знаходженням оцінок кожної з них, то пропонується для знаходження оцінок використовувати паралельні обчислення. Для програмної реалізації використано мову програмування C#. Отримані результати обчислювального експерименту свідчать про досягнення виграшу у швидкодії при використанні двох процесорів у 1,6 рази, чотирьох – у 3,4.

The parallel processes research using Petri-Markov nets

This paper represents the mathematical tool of Petri-Markov nets. There were shown the usage capabilities of the given tool when looking at the parallel processes. The developed programs to work with Petri nets and Petri-Markov nets were examined. Then it was described the software created to generate and model Petri-Markov nets. The quick user manual was put; the man-machine interface was shown. The tables to keep the net structures were developed. The software obtained is used to estimate the optimality of the different variants of the parallelized and serial code of the program. The results of this work can be used to model real processes represented in the form of Petri-Markov nets.

Prediction of behavior distributed monitoring system method in condition of impacts external disturbing factors

At work “Prediction of Behavior Distributed Monitoring System Method In Condition Of Disturbing Factors” monitoring systems are watched as inalienable and important part of people life activity; the purpose of it’s creation. The method of behavior prediction monitoring system at external disturbing factors is watched and offered, namely the probability characteristic of monitoring systems’ brittleness from time and from number of external disturbing factors. The example of software work is shown, which makes a work model of prediction of behavior monitoring system method. The basic way of further prediction method development is offered – using it at computer complexes, directed to redistribution of the least reliable system component problems, which has more stability to external disturbing factors

IP Core Synthesis in a Cloud

The approach to design of the system for IP core synthesis in a cloud is proposed, which based on the XML data representation and graph drawing which uses SVG. The approach is proven in a framework, which is intended for the SDF algorithm graph input, its graphical editing and sending to a cloud. The result of the framework operation is the optimized pipelined IP core, which is described by VHDL, and is ready to be modeled and synthesized using traditional CAD tools.

Increase of level of stability of functioning of systems of storage and data processing at the expense of realization of actions for ensuring safety of information

The mathematical model of optimization of structure of technical means of ensuring safety of information in storage and data processing systems is formalized. For increase of efficiency of a method of branches and borders at the solution of this task use of algorithm of preliminary definition of an order of branching of variables on the basis of application of the theory of a duality is offered.

Definition of the Document-Oriented Data Model

The data models which form the basis of the NoSQL DBMS are built. The sets and multisets (bags) are used to define the models. The special relations called subdocument and subrecord are introduced. It is proven those relations are preorder. Also general results about the cofinal relation on the sets are given.

Associative storage device

An associative memory unit is proposed. Associativity is not rigidly tied to address memory cells and allows simultaneous access to information and combination of functions of processing and storing information. The proposed storage unit provides high performance memory. In fact, the speed of data processing in the storage device does not depend on the number of words entered into memory.
Another feature of the proposed unit is using the principles of dynamic random access memory for creation memory cells. The choice of dynamic memory provides the ability to create high-capacity memory on a well-known and accessible for production technology.
Such symbiosis of the principles of associative search and dynamic memory as a result creates high-capacity memory with considerable speed and possibility of data processing directly in the memory unit.

Numerical analysis of the magnetic field in the conducting nanoparticle

Abstract. The paper deals with the numerical analysis of the magnetic field of the ferromagnetic nanoparticles, which is induced by an external time-varying magnetic field. We compare the analytical and numerical results, which give the opportunity to investigate the role of conductivity in the dynamics of the magnetization of single-domain ferromagnetic particles. This approach is based on a combination of Maxwell’s and Landau-Lifshitz-Gilbert (LLG) equations that describes both the induced electromagnetic field and the magnetization dynamics. It is shown that the effective LLG equation for a conducting particle contains two additional terms compared to the ordinary LLG equation. One of these terms accounts for the magnetic field of eddy currents induced by an external magnetic field, and the other is magnetization dependent and is responsible for the conductivity contribution to the damping parameter.

Methods and parallel code “FREGAT” for distribution of substances in mixed cells of computational meshes

To solve a task of substances distribution and physical values computation in mixed cells of arbitrary hexagonal meshes developed was parallel program FREGAT as a part of computer codes of 3D preprocessor in RFNC VNIITF. The results of intersects of mesh cells with arbitrary CAD model geometry domains, where domains characterize substance, are the basis for calculating mass and volume concentrations of substances in mixed cells and other physical values. In result generated is the initial profile of mesh fields of physical values, which are transferred to be calculated to applied programs in HDF format. The paper presents results of numerical investigations at different models. Calculations were conducted in different modes with various numbers of processes.

Parallel Computing Technologies in the Finite Element Method

The finite element method is a powerful tool for the numerical simulation of a wide range of problems.

Implementation of the finite element method in CAD systems on the basis of modern computers allows researchers to solve large scale problems.

The article describes parallel algorithms for assembly of stiffness matrix and for solution of linear equations.

Also this article contains two numerical experiments: Dirichlet problem on the complex domain (gear); Elasticity problem of three-layer shell.

In the last section of the article author compares the performance of solutions of these problems with one, two and four parallel cores.

Using fuzzy logic system for making decisions about information security in grid infrastructure

In purpose of more objective formalization of peer review, reduction of the impact of subjective methods of assessment of risks, increasing of number of the impacts on information risks, in order to model the decision-making mechanism for information security the fuzzy logicsystem (FLS)has been proposed to use.

FPGA-coprocessor for solving SLAE

Розглянуто співпроцесор на ПЛІС Cyclon II Altera, призначений для рішення СЛАР від великої кількості змінних методом Гауса. Запропоновані апаратні засоби можуть бути застосовані в області паралельних обчислень для вирішення проблеми підвищення продуктивності комп’ютерних систем, адаптивних до класу задач, що вирішуються.

Please note! Schedule may change for some reason. Please check it now and then.