Home | All Posts | Random Post |
In the process of doing a mod loader write-up which I'll post soon, I started advocating for the use of a dll for the mod related code since it won't be used frequently. This sent me down the metaphorical rabbit hole investigating the trade-offs of using a dll vs a static library.
Pretty much all discussion concerning dlls (or shared libraries in the Unix world) vs static libraries focuses on the most obvious aspects.
None of these points speak to the actual cost that comes from the flexibility a dll provides. I'm going to investigate the size and runtime cost of using a dll as opposed to a static library. As a size reference point I'll use an empty dll. This makes the comparison fairly simple as an empty static library linked into a binary should result in zero extra size due to linker optimizations. I won't be investigating the cost of integrating a large dll vs large static library since I don't have one on hand but I've noted it as a potential topic for future investigations. An interesting "worst case" scenario there would be if a dll and exe both link and execute a static library which is probably a fairly comment scenario in practice. This would mean that all the (called) library code is duplicated between the binaries and the memory footprint will increase an extra time by the size of the shared library (approximately, it is actually just by the size of the union of common code).
One more obscure implementation details of dlls that has implications for load time is relocations that need to be performed on dll load. By default, dlls are built assuming a fixed base address and all absolute jump addresses rely on the dll being loaded at that address. When the dll isn't loaded at that address (as is probably almost always going to be the case) all the absolute jumps must be patched using the new base address. Note that on Linux based system it is possible to generate position independent code where this wouldn't be an issue; however, this is not possible on Windows (at least with the Visual Studio toolchain).
Just building an empty dll with int main() {}
generates a dll that is 9.5 KBs large in a release configuration. This is most likely due to the CRT functions involved in loading the dll which is simultaneously enormous and an acceptable cost in almost all situations. The inspiration for this article was the cost of relocations, so I wanted to dig into that next. I spent a good deal of time poring over the PE format Windows uses for executables and eventually found the .reloc section where the data is stored to facilitate relocations.
The Visual Studio build tools contain dumpbin.exe which can be used to explore PE binaries, so I set off to get some numbers. I used some well-known dlls as well as an empty dll containing just the empty main function from above and an executable I had handy (Note these were all built in release mode without debug symbols using /O2 /DEBUG:NONE
. I patched the entry point symbols in from a separate compilation with symbols enabled).
"<VS_path>\Visual Studio\<VS_Release>\BuildTools\VC\Tools\MSVC\<VS_Version>\bin\Hostx64\x64\dumpbin.exe" /ALL C:\Windows\System32\opengl32.dll
[...]
OPTIONAL HEADER VALUES
[...]
180000000 image base (0000000180000000 to 0000000180124FFF)
[...]
F3980 [ 21A0] RVA [size] of Export Directory
F5B20 [ F0] RVA [size] of Import Directory
122000 [ 3E8] RVA [size] of Resource Directory
119000 [ 708C] RVA [size] of Exception Directory
123000 [ 1210] RVA [size] of Base Relocation Directory
EE4D0 [ 70] RVA [size] of Debug Directory
0 [ 0] RVA [size] of Thread Storage Directory
E0950 [ 118] RVA [size] of Load Configuration Directory
E0A68 [ 9B8] RVA [size] of Import Address Table Directory
F37A0 [ 80] RVA [size] of Delay Import Directory
[...]
Summary
21000 .data
1000 .didat
8000 .pdata
19000 .rdata
2000 .reloc
1000 .rsrc
DE000 .text
"<dumpbin_path>\dumpbin.exe" /ALL <OpenAL_path>\OpenAL32.dll
[...]
OPTIONAL HEADER VALUES
[...]
6F000000 image base (000000006F000000 to 000000006F101FFF)
[...]
FB000 [ 1169] RVA [size] of Export Directory
FD000 [ 15E8] RVA [size] of Import Directory
0 [ 0] RVA [size] of Resource Directory
ED000 [ 3084] RVA [size] of Exception Directory
101000 [ 980] RVA [size] of Base Relocation Directory
0 [ 0] RVA [size] of Debug Directory
100020 [ 28] RVA [size] of Thread Storage Directory
0 [ 0] RVA [size] of Load Configuration Directory
FD564 [ 4D8] RVA [size] of Import Address Table Directory
0 [ 0] RVA [size] of Delay Import Directory
[...]
Summary
1000 .CRT
6000 .bss
1000 .data
2000 .edata
2000 .idata
4000 .pdata
7C000 .rdata
1000 .reloc
6F000 .text
1000 .tls
4000 .xdata
"<dumpbin_path>\dumpbin.exe" /ALL <empty_dll_path>\empty.dll
[...]
OPTIONAL HEADER VALUES
[...]
1320 entry point (0000000180001320) _DllMainCRTStartup
1000 base of code
180000000 image base (0000000180000000 to 0000000180005FFF)
[...]
0 [ 0] RVA [size] of Export Directory
26EC [ 50] RVA [size] of Import Directory
5000 [ 1E0] RVA [size] of Resource Directory
4000 [ 180] RVA [size] of Exception Directory
6000 [ 24] RVA [size] of Base Relocation Directory
2160 [ 1C] RVA [size] of Debug Directory
0 [ 0] RVA [size] of Thread Storage Directory
2180 [ 138] RVA [size] of Load Configuration Directory
2000 [ D8] RVA [size] of Import Address Table Directory
0 [ 0] RVA [size] of Delay Import Directory
[...]
Summary
1000 .data
1000 .pdata
1000 .rdata
1000 .reloc
1000 .rsrc
1000 .text
"<dumpbin_path>\dumpbin.exe" /ALL <text_game_exe_path>\Text-Game.exe
[...]
OPTIONAL HEADER VALUES
[...]
6C48 entry point (0000000140006C48) mainCRTStartup
1000 base of code
140000000 image base (0000000140000000 to 0000000140047FFF)
[...]
639330 [ 164] RVA [size] of Export Directory
69F188 [ DC] RVA [size] of Import Directory
6A6000 [ 43C] RVA [size] of Resource Directory
67A000 [ 1FF08] RVA [size] of Exception Directory
6A7000 [ 29D8] RVA [size] of Base Relocation Directory
6056B4 [ 38] RVA [size] of Debug Directory
608620 [ 28] RVA [size] of Thread Storage Directory
6056F0 [ 138] RVA [size] of Load Configuration Directory
69E000 [ 1188] RVA [size] of Import Address Table Directory
0 [ 0] RVA [size] of Delay Import Directory
[...]
Summary
1000 .00cfg
40000 .data
6000 .idata
24000 .pdata
CC000 .rdata
8000 .reloc
1000 .rsrc
56D000 .text
1000 .tls
The generated output is huge, so I've trimmed most of it as noted by [...]; however, digging through it reveals some interesting numbers. First, it appears that despite PE documentation reporting the default image base address as 0x10000000 it seems to actually be 0x180000000 for dlls and 0x140000000 for exes. Additionally, it looks as though some dlls explicitly specify a different base in an attempt to avoiding the need for relocation. The empty binary has 12 total relocations coming from the CRT and a total size of 0x24 which means that each relocation takes 3 bytes. The largest number of relocation then is OpenAL.dll which has 1035. This number seems small enough to have a completely negligible impact on load times, but this does point to some interesting areas to investigate dll size, specifically Resource Directory, Load Configuration Directory, and Debug Directory. OpenAL.dll has a size of 0 for all these components, less than the empty dll so investigating this potential optimization seems the obvious path.
The relevant section for the resource directory is .rsrc. Searching the dumpbin output for that:
OpenGL.dll
SECTION HEADER #6
.rsrc name
3E8 virtual size
122000 virtual address (0000000180122000 to 00000001801223E7)
400 size of raw data
101E00 file pointer to raw data (00101E00 to 001021FF)
0 file pointer to relocation table
0 file pointer to line numbers
0 number of relocations
0 number of line numbers
40000040 flags
Initialized Data
Read Only
RAW DATA #6
0000000180122060: 84 03 34 00 00 00 56 00 53 00 5F 00 56 00 45 00 ..4...V.S._.V.E.
0000000180122070: 52 00 53 00 49 00 4F 00 4E 00 5F 00 49 00 4E 00 R.S.I.O.N._.I.N.
0000000180122080: 46 00 4F 00 00 00 00 00 BD 04 EF FE 00 00 01 00 F.O.....½.ïþ....
[...]
00000001801220F0: 34 00 42 00 30 00 00 00 4C 00 16 00 01 00 43 00 4.B.0...L.....C.
0000000180122100: 6F 00 6D 00 70 00 61 00 6E 00 79 00 4E 00 61 00 o.m.p.a.n.y.N.a.
0000000180122110: 6D 00 65 00 00 00 00 00 4D 00 69 00 63 00 72 00 m.e.....M.i.c.r.
0000000180122120: 6F 00 73 00 6F 00 66 00 74 00 20 00 43 00 6F 00 o.s.o.f.t. .C.o.
0000000180122130: 72 00 70 00 6F 00 72 00 61 00 74 00 69 00 6F 00 r.p.o.r.a.t.i.o.
0000000180122140: 6E 00 00 00 4C 00 12 00 01 00 46 00 69 00 6C 00 n...L.....F.i.l.
0000000180122150: 65 00 44 00 65 00 73 00 63 00 72 00 69 00 70 00 e.D.e.s.c.r.i.p.
0000000180122160: 74 00 69 00 6F 00 6E 00 00 00 00 00 4F 00 70 00 t.i.o.n.....O.p.
0000000180122170: 65 00 6E 00 47 00 4C 00 20 00 43 00 6C 00 69 00 e.n.G.L. .C.l.i.
0000000180122180: 65 00 6E 00 74 00 20 00 44 00 4C 00 4C 00 00 00 e.n.t. .D.L.L...
[...]
Empty.dll
SECTION HEADER #5
.rsrc name
1E0 virtual size
5000 virtual address (0000000180005000 to 00000001800051DF)
200 size of raw data
2200 file pointer to raw data (00002200 to 000023FF)
0 file pointer to relocation table
0 file pointer to line numbers
0 number of relocations
0 number of line numbers
40000040 flags
Initialized Data
Read Only
RAW DATA #5
[...]
0000000180005060: 3C 3F 78 6D 6C 20 76 65 72 73 69 6F 6E 3D 27 31 <?xml version='1
0000000180005070: 2E 30 27 20 65 6E 63 6F 64 69 6E 67 3D 27 55 54 .0' encoding='UT
0000000180005080: 46 2D 38 27 20 73 74 61 6E 64 61 6C 6F 6E 65 3D F-8' standalone=
0000000180005090: 27 79 65 73 27 3F 3E 0D 0A 3C 61 73 73 65 6D 62 'yes'?>..<assemb
00000001800050A0: 6C 79 20 78 6D 6C 6E 73 3D 27 75 72 6E 3A 73 63 ly xmlns='urn:sc
00000001800050B0: 68 65 6D 61 73 2D 6D 69 63 72 6F 73 6F 66 74 2D hemas-microsoft-
00000001800050C0: 63 6F 6D 3A 61 73 6D 2E 76 31 27 20 6D 61 6E 69 com:asm.v1' mani
00000001800050D0: 66 65 73 74 56 65 72 73 69 6F 6E 3D 27 31 2E 30 festVersion='1.0
00000001800050E0: 27 3E 0D 0A 20 20 3C 74 72 75 73 74 49 6E 66 6F '>.. <trustInfo
00000001800050F0: 20 78 6D 6C 6E 73 3D 22 75 72 6E 3A 73 63 68 65 xmlns="urn:sche
0000000180005100: 6D 61 73 2D 6D 69 63 72 6F 73 6F 66 74 2D 63 6F mas-microsoft-co
0000000180005110: 6D 3A 61 73 6D 2E 76 33 22 3E 0D 0A 20 20 20 20 m:asm.v3">..
0000000180005120: 3C 73 65 63 75 72 69 74 79 3E 0D 0A 20 20 20 20 <security>..
0000000180005130: 20 20 3C 72 65 71 75 65 73 74 65 64 50 72 69 76 <requestedPriv
0000000180005140: 69 6C 65 67 65 73 3E 0D 0A 20 20 20 20 20 20 20 ileges>..
0000000180005150: 20 3C 72 65 71 75 65 73 74 65 64 45 78 65 63 75 <requestedExecu
0000000180005160: 74 69 6F 6E 4C 65 76 65 6C 20 6C 65 76 65 6C 3D tionLevel level=
0000000180005170: 27 61 73 49 6E 76 6F 6B 65 72 27 20 75 69 41 63 'asInvoker' uiAc
0000000180005180: 63 65 73 73 3D 27 66 61 6C 73 65 27 20 2F 3E 0D cess='false' />.
0000000180005190: 0A 20 20 20 20 20 20 3C 2F 72 65 71 75 65 73 74 . </request
00000001800051A0: 65 64 50 72 69 76 69 6C 65 67 65 73 3E 0D 0A 20 edPrivileges>..
00000001800051B0: 20 20 20 3C 2F 73 65 63 75 72 69 74 79 3E 0D 0A </security>..
00000001800051C0: 20 20 3C 2F 74 72 75 73 74 49 6E 66 6F 3E 0D 0A </trustInfo>..
00000001800051D0: 3C 2F 61 73 73 65 6D 62 6C 79 3E 0D 0A 00 00 00 </assembly>.....
Text-Game.dll
[...] // Identical to Empty.dll!
The openGL manifest appears to be some Microsoft generated text, whereas the manifest in the empty dll seems to be some auto generated user account control (UAC) goop. After searching through the linker documentation, I came across the /MANIFEST which specifies that the manifest should be generated next to the linked binary by default and not embedded. That appears to be incorrect, but more importantly the manifest can be disabled via /MANIFEST:NO
which shaves off 500 bytes. This is a bit underwhelming for larger binaries, but for the empty dll this is over a 5% reduction in size at no cost, bring the empty dll down to 9 KB.
This appears to be coming from Structured Exception Handling (SEH). I don't ever plan to use SEH, but there isn't an option to disable this presumably because the C Runtime (CRT) does. Taking a stack trace of any main function makes this readily apparent.
0b 000000a4`a5eff030 00007ff7`b07b0579 Text_Game!main+0x4c3 [main.cpp @ 71]
0c 000000a4`a5effc40 00007ff7`b07b045e Text_Game!invoke_main+0x39 [exe_common.inl @ 79]
0d 000000a4`a5effc90 00007ff7`b07b031e Text_Game!__scrt_common_main_seh+0x12e [exe_common.inl @ 288]
// SEH is encoded into the main function name itself ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0e 000000a4`a5effd00 00007ff7`b07b060e Text_Game!__scrt_common_main+0xe [exe_common.inl @ 331]
0f 000000a4`a5effd30 00007ffe`e1fa7034 Text_Game!mainCRTStartup+0xe [exe_main.cpp @ 17]
10 000000a4`a5effd60 00007ffe`e3722651 KERNEL32!BaseThreadInitThunk+0x14
Browsing through the symbols from dumpbin also makes it fairly apparent that most of the size is coming from calls that the CRT is making during program startup. Making a quick program that probably will not, in any way, work:
// Pass /entry:DllEntryPoint and /NODEFAULTLIB to the linker
extern "C" BOOL __stdcall DllEntryPoint(HINSTANCE, DWORD, LPVOID) { return 1; }
Voila, the compiled empty dll is now 1.5 KB. Note that this removed all data from the binary except the Debug Directories. All the symbols, imports, relocation, and exception information was indeed coming from the CRT initialization functions.
Digging right in to the Debug Directory:
empty.txt
Debug Directories
Time Type Size RVA Pointer
-------- ------- -------- -------- --------
61FB62ED coffgrp 248 00002318 1518 4C544347 (LTCG)
This is identical to the Debug Directory in OpenGL and Text-Game. It appears to be Link Time Code Generation (LTCG) info stashed in the binary; however, on disabling LTCG this data is still present. Further internet spelunking led me to the /EMITPOGOPHASEINFO
undocumented linker option which does in fact strip this information out. Best of all, the LTCG option can still be specified and doesn't appear to add any additional data (unless it is being stripped out and will cause issues at runtime). With this change, the total dll size is now... 1.5 KB. Slightly unexpected, but dumping the dll once again gives a hint of what the issue is.
SECTION HEADER #1
.text name
6 virtual size
1000 virtual address (0000000180001000 to 0000000180001005) // <<<<<<< Huge virtual address
200 size of raw data // <<<<<<<< Huge raw data size as well
200 file pointer to raw data (00000200 to 000003FF)
0 file pointer to relocation table
0 file pointer to line numbers
0 number of relocations
0 number of line numbers
60000020 flags
Code
Execute Read
RAW DATA #1
0000000180001000: B8 01 00 00 00 C3 ¸....Ã
Summary
1000 .text
The .text segment is located virtual address 0x1000 or 4096. This number rang a bell. I'd seen this number while digging through the linker documentation. The very first linker option, alignment is set to this by default. Well, okay I'll make it smaller then. /ALIGN:16
Is the smallest option that would compile and this brings the empty binary size down to 528B. That seems like a reasonable overhead to attribute to the PE header as the optional header alone is over 100B. I don't expect this dll will run in this configuration for a few reasons
/DRIVER
is also passed as a link option The last point is the easiest to solve. I re-enabled /DEBUG:FULL
at the cost of about 200B (Dumping the dll again reveals this will vary depending on the location of the PDB since it hard codes the path into the .rdata section. Time to map a drive straight to the build directory!) The dll is back up to 700B but I can easily disable this later.
Unfortunately, the ALIGN option doesn't scale well as adding code will force the alignment to increase. Some more digging revealed that the real culprit for this size was staring me in the face the whole time.
OPTIONAL HEADER VALUES
[...]
1000 section alignment
200 file alignment
using ALIGN modifies the section alignment but changing that also decreases file alignment which is what was actually causing the size decrease. Using /FILEALIGN:1
I can prevent the sections from being aligned on disk and doing so preserves the file shrinkage (actually shaves off a few more bytes) while allowing the ALIGN And DRIVER options to be removed. For whatever reason, specifying /ALIGN:16
and /FILEALIGN:1
will result in the file alignment being pulled up to 16 so the options can't be specified simultaneously to achieve the size reduction. Also note that if ALIGN isn't specified, each section will be put on a different memory page meaning usage may well be higher than the file size suggests. I'll investigate that later.
Disabling debug information again (/DEBUG:NONE
), the binary is now a scant 510B which I've concluded is as small as it can get via documented and undocumented linker options. At this point, I loaded the dll to validate it is still well-formed and unfortunately discovered that the Windows loader refuses to load the dll, producing the error code 139. After some fiddling (since at this point the dll is way past any expected, documented state) I discovered that the issue is actually /FILEALIGN. Specifying a value smaller than 512 will produce a seemingly valid binary, but the loader will refuse to load it. There is nothing in the documentation to suggest why this might be the case, so I'll have to stick with just the /ALIGN directive and 528B as the minimal size dll.
Normally that would be the end of the story; however, I've spent a decent time investigating the format, so I decided to create a C++ program to strip the binary directly. The quest for a smaller binary continues!
I am creating a companion write-up on the binary stripping program I wrote which has some additional interesting details about the PE format and will add a link at the end of this article when it is published. Running that program on the generated dll to remove the DOS stub and the undocumented RICH header that goes with it, the total binary size is reduced by 112B. The total binary size should be 416B; however, Windows is reporting only 384B, so I'm uncertain where the extra bytes went. That said, logically 384B seems correct as it is truly the hard limit for how small an empty PE binary can be. The mandatory header sections and their respective sizes are (note despite the name, the optional header is in fact not optional):
Which comes out to 64B + 4B = 68B + 20B = 88B + 240B = 328B + 40B = 368B
on x64. The raw data stored in the binary adds another 6B making the smallest possible number 374B. The desparity between the hard limit I'm hitting at 384B and the absolute minimum of 374B is coming from the alignment to 16B boundaries specified at link time. This is readily apparent from dumping the stripped binary.
SECTION HEADER #1
.text name
6 virtual size
[...]
10 size of raw data
The virtual size is only 6B; however, the raw data size is 0x10B so there is exactly a 10B different due to padding the end of the section to align with a 16B boundary which gives: 368B + 16B = 384B
.
I compiled a test executable:
#include <Windows.h>
int main()
{
auto handle = LoadLibraryEx("mod.dll", NULL, 0);
return FreeLibrary(handle);
}
Fired up WinDbg and then File->Launch Executable->mod_loader.exe.
ntdll!LdrpDoDebuggerBreak+0x30:
00007ffe`e37a06b0 cc int 3
0:000> sxe ld mod.dll
0:000> g
[...]
ModLoad: 00007ffe`dd870000 00007ffe`dd8702e0 mod.dll
ntdll!NtMapViewOfSection+0x14:
00007ffe`e376d274 c3 ret
0:000> x mod!*
*** WARNING: Unable to verify checksum for mod.dll
00007ffe`dd870200 mod!DllEntryPoint (DllEntryPoint)
0:000> bae 1 00007ffe`dd870200
0:000> g
Breakpoint 0 hit
mod!DllEntryPoint:
00007ffe`dd870200 b801000000 mov eax,1
Surprisingly, everything works. Next, I'm going to try some float point operations since this was mentioned to cause issues. Sure enough, I hit a link error:
dll_main.obj : error LNK2001: unresolved external symbol _fltused [mod.vcxproj]
Which I promptly investigated and fixed (Omitting 39029 as I have no idea what that constant means. This is statically linked into the binary so it is safe to copy if I wanted to).
windbg:
0:000> x mod_loader!_fltused
00007ff6`34948940 mod_loader!_fltused = 0n39029
dll_main.cpp:
extern "C" int _fltused = 0;
With that in place, the dll appears to load without issue.
The dll in its current stripped down form is usable, but I wouldn't recommend using it in this state. The CRT startup functionality that was removed manages globals (and presumably statics as well), calling constructors and destructors as needed on dll load and detach as well as managing the actual dll reference count and even exiting the process when main returns. If space is really a concern it is probably possible to re-implement this functionality for each compiler since it is documented. and digging through symbols imported from the 'api-ms-win-crt-runtime-l1-1-0.dll' apiset should reveal the calls needed such as _initterm for initialization. For a quite interesting run down on this topic, I found this very informative article on the topic that elucidates implementation details (I found the opt:nowin98 linker switch particularly fascinating as it mentioned the default 4KB alignment for PE sections that passing /ALIGN works around -- as nowin98 no longer exists -- for significant size savings). I don't use RTTI, exceptions, or security cookies in release mode, so I might even be able to make it smaller than the default CRT initialization provided but currently a few KB is cheap enough I'm not going to bother.
One last thing I'd like to verify is load time numbers since that was the original motivation for digging into this topic and a completely empty dll seems like a great reference point. Using a simple test harness:
auto start = std::chrono::high_resolution_clock::now();
LoadLibraryEx("mod.dll", NULL, 0);
auto end = std::chrono::high_resolution_clock::now();
LOG_WARNING("Elapsed Time: ", std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count(), " ns"); // std::cout
I ran the program five times (unfortunately, once a dll is loaded it will remain cached so subsequent loads will take ~1us making collecting data a bit difficult). I also loaded openAL32.dll (along with its 6 system dll dependencies) for a comparison. The times were
openAL32.dll
154,376,800 ns, 205,946,800 ns, 131,473,800 ns, 148,372,700 ns, 155,024,300 ns
Average Time: 159,038,880ns or 159ms
mod.dll
23,765,200 ns, 17,538,500 ns, 17,295,900 ns, 17,336,100 ns, 21,954,100 ns
Average Time: 19,577,960ns or 19.5ms
Despite only aggregating five data points here, I incidentally witnessed a much large number of runs and while these numbers are fairly representative of the best case, I did occasionally see mod.dll taking as long as OpenAL32.dll to load. My best guess is that this is due to some sort of loader lock contention. The main takeaway for me is that if performance is critical, dlls should be loaded at process startup or at least in advance of the first call into the binary to avoid potentially huge latency. Even empty dlls take a substantial amount of time and the issue is compounded when a dll needs to load other dll dependencies as is the case with openAL32.dll.
Digging through the PE Header documentation to the characteristics sections reveals that some of the OpenAL characteristics are actually deprecated. All except for IMAGE_FILE_DEBUG_STRIPPED which presumably correlates to 'Debug information stripped'. Unfortunately that isn't much in the way of documentation for setting this option in the linker and since I already have a completely empty dll, it probably doesn't mean much but it will remain a curiosity.
This whole write up started because I wanted to investigate dll load times due to relocations and potential speedup from using binding. I promptly got sidetracked by the the dumpbin tool and spent a lot of time trying to squeeze a few KBs out of the generated dll without having addressed the load time issue. Unfortunately, after investigating binding I realized that it isn't practical to use since it requires disabling address space layout randomization aka dynamic base which makes it seem a lot less practical.
I came into this investigation fairly set on dlls being a great design choice in a wide variety of situations and left with the impression that there are actually only a few scenarios where dlls are a good choice:
In a large multipurpose system like an OS, dlls make a lot of sense as all four of the reasons are often relevant. In a smaller special purpose binary like a game I'm not even convinced a single one of these reasons is generally applicable (at least not when developed by a single person)!
I'm currently writing a followup article to this to cover the mysterious binary stripping tool I mention that will go more into details on the tool. Additionally, I'm going to investigate how much "bloat" is caused by exceptions and how that grows with the binary size. The article is now available along with the source for the tool at https://git.grahalt.com/.