Friday, 2 September 2016

Vulkan Tutorial - Introduction

Hi, With all the buzz about the latest Graphics API Vulkan, I thought I'd create a series of tutorials and sample code. One of the things that was good about OpenGL was the vast knowledge base that was easily available online. This would be essential for any new technology/idea. As you may already know, Vulkan is the latest Graphics API released by Khronos and most IHVs have already shipped the initial Vulkan implementations. There's already quite a bit of information available as to how you can setup and start using Vulkan, so I will skip that part. There are however some ideas/concepts I would like to make it clear that I didn't see available from other sources. 

 In the introduction, I will have more of an FAQ based Q and A section to introduce Vulkan. 

 1. So, Vulkan is the next OpenGL right?
Vulkan is NOT replacing OpenGL. I repeat, Vulkan is not a replacement for OpenGL. Vulkan is a low level, explicit API that is designed to solve a specific set of problem(s). If, as an application developer, you're not facing those issues, then -- NO, you probably don't need Vulkan. 

 2. So, what is Vulkan good for? 
Good question. Some of the main problems that Vulkan is trying to solve are- 
- Having low CPU driver overhead. 
- Be more explicit and give better access to hardware. 
- Be multithreaded friendly. 
- Be truly cross-platform. 
- And a few more. The above are the most important ones. 
 As you may know, OpenGL has been around for quite a while (~24 years as of this writing). It has undergone significant changes, but fundamentally, OpenGL was not designed to solve the above mentioned three problems. GPU and CPU hardware has changed immensely since the first version of OpenGL. 

 3. What do you mean by 'CPU driver overhead' 
The main job of the OpenGL(or any GPU driver software) is to submit commands to the GPU. The GPU is supposed to run parallel to the CPU and the main goal of the driver software is to pick up commands from the application and submit them to the GPU. 

 4. What is the overhead involved? (3) seems quite simple. Why is it so CPU intensive?
Conceptually, the idea looks simple. However, the driver has to do a LOT of operations in the back. By design, OpenGL was meant as an "easy" API. By "easy", I mean that the application developer need not have to worry a lot. He/She can just call the simple APIs and the driver makes sure things work. Now, with a simple API, the driver has to do a lot of things, namely maintaining certain heuristics about how the application is using resources, making intelligent guesses to provide optimizations, guessing how the buffers would be used by application, doing state validations, allocating and cleaning up of memory and tonnes of other things. All this work tends to create huge bottlenecks on the CPU. So, there could be applications that are bound on the CPU. 

 5. Ok, I understand that the driver has a lot of work, but what is this "bound by the CPU"?
An application is said to be bound by CPU if it has amassed so much work for the GPU, but not able to submit commands to the GPU since the driver is still busy doing stuff on the CPU. In this case, what happens is that the GPU is actually stalling on the CPU! And this is a bad thing. The GPU is meant to be running parallel to the CPU. 

6. Can you give examples of how my application would be CPU bound? 
If you're familiar with OpenGL, draw calls tend to be quite heavy. There's lot of state validation, etc. that happens in the GL driver that makes draw calls CPU intensive. So, if your application has a lot of draw calls, it may start becoming CPU bound. This is a simple example of how your application may be CPU bound. You may want to use profilers or the right debuggers to actually find out if your application is CPU bound. 

 7. Ok, I understand all of this. I am a newbie to Computer Graphics, do you advice me to use Vulkan? 
It really depends. If you just want to learn Graphics and experiment a bit, Vulkan may not be the right choice for you. However, if you want to know how drivers may work and learn more closely to the hardware, maybe you can try using Vulkan. 

8. New is always good right? So shouldn't I not use Vulkan for my production code? 
Wait..calm down! Before using a new API, you'd want to profile and find out if you really are CPU bound or maybe having other issues that OpenGL cannot solve. If the answer is yes, then use Vulkan. 

 9. Why can't OpenGL solve all those problems mentioned in (2). 
It does, to some extent. There are proprietary extensions available that have low driver overheads. Multi-threaded command submission is also available to some extent. You may want to check out the following 

  1. OpenGL AZDO
  2. OpenGL Batching
  3. Command Lists
 If none of those help, Vulkan is the right choice for you. 

10. What do you mean by 'more explicit'?
OpenGL, by design, hides many things from the application. This causes application not to be sure of the behavior. Another thing is that different Hardware Vendors may implement certain things a bit differently causing application behavior across different vendors to be slightly unpredictable. Vulkan on the other hand is designed to not have these unpredictability. 

11. What is meant by 'multithreaded-friendly'?
OpenGL was designed when CPUs had one core and multithreaded programming wasn't mainstream. But was time went by, multi-threaded programming and multi-core CPUs started becoming very common and the APIs design couldn't really scale well. There is a provision of creating shared contexts in OpenGL, but the command submission is generally serialized. This is one of the major issues in current day rendering engines that use OpenGL. There's always a single "render thread" that submits all draw calls. To say it in simpler words, OpenGL was never meant to be multithreaded friendly. 

12. Why do you say OpenGL is NOT truly Cross-platform. 
If you have really written applications that are shipped on Desktop, Embedded and multiple OSes, you'd probably know what I mean. True, OpenGL ES is meant to be a subset of OpenGL. However, there are always a lot of issues you could run into if your application is meant to be supported on multiple platforms. Another major issue is the Window system interfacing. The whole Wgl, EGL, X11 code creates a big mess of #ifdefs! This is another major problem Vulkan is trying to solve. Have one code that runs on multiple platforms with very minimal ifdefs. This is possible via the WSI(Window System Integration). So, these are some of the questions I had before I started learning Vulkan. 

I hope this helped you as well. We can start learning more in the coming tutorials. Please leave questions if you feel I can add more details.

Thursday, 7 July 2016

Linux Memory management

Introduction

This article concentrates on some useful commands that come in handy while analyzing memory related performance issues on Linux.

The article is inspired from this video. I've listed down all the commands mentioned in that video. The video is pretty long and this article is an attempt to create a gist of items mentioned there.

Commands:-


1. vmstat:-

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 4698476 172856 1552720    0    0     3    12   82   79  2  1 98  0  0


2. dstat --top-oom:- This command helps us to find out which process would be terminated first in case of high memory consumption. In such a case, the kernel terminates the highest memory consumer. This command helps us find that out.


3. getconf PAGE_SIZE:- This returns the default page size on the machine. On my Ubuntu 14.04 64 bit, I see this as follows:-

mukund@mukund-desktop:~$ getconf PAGE_SIZE
4096

4. pmap:- This returns the address mapping of a process. Here, in the output, we can see the virtual address and the location of the actual file. The 'anon' are heap and stack memory.

mukund@mukund-desktop:~$ ps -ae | grep term
 2205 ?        00:00:20 gnome-terminal
mukund@mukund-desktop:~$ pmap 2205
2205:   gnome-terminal
0000000000400000    284K r-x-- gnome-terminal
0000000000646000      4K r---- gnome-terminal
0000000000647000     12K rw--- gnome-terminal
0000000000b93000   8400K rw---   [ anon ]
00007f309c000000    136K rw---   [ anon ]
00007f309c022000  65400K -----   [ anon ]
00007f30a4000000    132K rw---   [ anon ]
00007f30a4021000  65404K -----   [ anon ]
00007f30a958d000      4K -----   [ anon ]
00007f30a958e000   8192K rw---   [ anon ]
00007f30a9d8e000  35272K r---- icon-theme.cache
00007f30ac000000    136K rw---   [ anon ]
00007f30ac022000  65400K -----   [ anon ]
00007f30b0000000    324K rw---   [ anon ]
00007f30b0051000  65212K -----   [ anon ]
00007f30b4113000      4K -----   [ anon ]
00007f30b4114000   8192K rw---   [ anon ]
00007f30b4914000    356K r-x-- libibus-1.0.so.5.0.505
00007f30b496d000   2048K ----- libibus-1.0.so.5.0.505
00007f30b4b6d000      8K r---- libibus-1.0.so.5.0.505
00007f30b4b6f000      4K rw--- libibus-1.0.so.5.0.505
00007f30b4b70000      4K rw---   [ anon ]
00007f30b4b71000     24K r-x-- im-ibus.so
00007f30b4b77000   2048K ----- im-ibus.so
00007f30b4d77000      4K r---- im-ibus.so
00007f30b4d78000      4K rw--- im-ibus.so
00007f30b4d79000    132K r-x-- liblzma.so.5.0.0
00007f30b4d9a000   2044K ----- liblzma.so.5.0.0
00007f30b4f99000      4K r---- liblzma.so.5.0.0
00007f30b4f9a000      4K rw--- liblzma.so.5.0.0
00007f30b4f9b000   1392K r-x-- libxml2.so.2.9.1
00007f30b50f7000   2048K ----- libxml2.so.2.9.1
...
 total           662024K

5. top:- This command can prove really useful at times. Here, 'VIRT' shows the amount of virtual memory being used and 'RES' shows the amount of memory present in RAM. There's a catch with the 'RES' memory. There are some libraries, like libC that are used by several processes. There's a single copy of this lib in physical memory, which is mapped into virtual memory of all processes. However, top counts this per process. As a result, the sum of all 'RES' memory is generally greater than the actual physical memory.


PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
2003 mukund    20   0 2210808 1.009g  89752 S  42.9 13.0  88:17.77 firefox

1151 root      20   0  359768  68188  32756 S   5.7  0.8  18:06.65 Xorg
1887 mukund    20   0 1630664 182452  75164 S   3.0  2.2  10:08.64 compiz
1305 mukund    20   0  379444  11852   7008 S   1.0  0.1   0:40.29 ibus-daemon
1678 mukund     9 -11  513332  12108   9728 S   1.0  0.1   1:04.37 pulseaudio
2205 mukund    20   0  662024  34672  24640 S   1.0  0.4   0:21.38 gnome-terminal


Example:-

Here, we see libc mapped as follows, for PID 2003

mukund@mukund-desktop:~$ pmap 2003 | grep libc
00007f9277d1d000   1772K r-x-- libc-2.19.so
00007f9277ed8000   2044K ----- libc-2.19.so
00007f92780d7000     16K r---- libc-2.19.so
00007f92780db000      8K rw--- libc-2.19.so


For PID 2205, we see the same libc mapped as follows:-

mukund@mukund-desktop:~$ pmap 2205 | grep libc
00007f30c6580000   1772K r-x-- libc-2.19.so
00007f30c673b000   2044K ----- libc-2.19.so
00007f30c693a000     16K r---- libc-2.19.so
00007f30c693e000      8K rw--- libc-2.19.so


So, we can see that there is only one physical copy of the file, however mapped into the virtual address space of two different processes. However, top counts this as two separate 'RES' memory.
This script can be used to determine a pretty accurate picture of the actual physical memory being used.

6. cat /proc/<PID>/smaps:- This command can be used to find the clean, dirty pages per process.

7.  cat /proc/meminfo: This can be used to find the total memory related information in the system. This below command can be used to find inactive and dirty pages.

mukund@mukund-desktop:~/Desktop$ cat /proc/meminfo | egrep -i "inactive|dirty"
Inactive:         725508 kB
Inactive(anon):    16384 kB
Inactive(file):   709124 kB
Dirty:                88 kB


The anonymous memory is the memory not mapped onto files, like the heap or the stack memory. This memory can only be moved to the swap space to page out.

8. min_free is the amount of memory after which the kernel(kswapd0 therad) starts seeing OOM and starts freeing memory.

mukund@mukund-desktop:~/Desktop$ ps -aux|grep swap
root        52  0.0  0.0      0     0 ?        S    Jul06   0:00 [kswapd0]


mukund@mukund-desktop:~/Desktop$ sudo sysctl -a | grep -i min_free
sysctl: reading key "net.ipv6.conf.all.stable_secret"
sysctl: reading key "net.ipv6.conf.default.stable_secret"
sysctl: reading key "net.ipv6.conf.eth0.stable_secret"
sysctl: reading key "net.ipv6.conf.lo.stable_secret"
vm.min_free_kbytes = 11350


9. sysq-trigger:- This is an interesting command to do special requests to the Linux kernel. The documentation can be found here.

Example:-

echo f > sysrq-trigger

This triggers the OOM killer.

10. lscpu:- This is a nice utility to get great information about your CPU.


mukund@mukund-desktop:~/Desktop$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 42
Stepping:              7
CPU MHz:               2964.828
BogoMIPS:              5587.24
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              6144K
NUMA node0 CPU(s):     0-3



11.ps -o min_flt,maj_flt <pid>: This command can be used to find the major and minor page faults of a process.