Wednesday, October 12, 2005

Java v. C continued

Looks like I got slashdotted on that one. I wasn't trying to particularly bash Java, more the concept of one language being "secure" over another. Java just makes a good example of a language people believe will solve all their problems for them.

With the responses I've seen both here and on slashdot, I feel I should make a follow-up post. I'll point out a few interesting things that have arisen, and let the wolves at it again.

The major thing I'd like to point out is that, especially on Slashdot, most of the replies seemed to argue from the point of vanilla C on a vanilla system versus Java. This may partly be my fault for using Java as a major target in the argument; but I did discuss vanilla language C programs using a hardened compiler on a hardened system. This was my argument, and I'd appreciate it if people would take time to comprehend the context before replying half-assed to some other similar but distinctly different argument. Once again, the blog was about C compiled using a hardened compiler, run in a hardened operating system environment.

Another interesting point was that certain attacks are still possible in Java. These include SQL injection and cross site scripting, something not inherantly C; although C programs could certainly use SQL libraries or script language parsers that would be vulnerable. Script languages also come to mind, immune to buffer overflows but rabidly vulnerable to XSS; efforts like Hardened PHP work to reduce the risks here.

One major argument which kept resurfacing was that C is insecure because of pointer math and explicit memory management. I'd like to restate here that the environments discussed minimize the possible damage of bad pointer use; you can't modify existing code as one comment alluded to, and you can't execute data such as the stack or heap. The address space layout randomization makes sure that attackers who can control pointers and such at least can't figure out where to point them because everything moves around every time the program is run.

On the same topic, what's so hard about manually managing memory? You can create functions or, if you fancy C++ or Objective-C, classes to manage your linked lists and memory objects. Calling these will abstract the memory management from most of your code. I guess in something that abstracts direct memory management from you, you'd either do the same thing minus malloc() and free() calls; or just haphazardly write the same 2-3 lines of allocation code everywhere, which is probably really a bad thing anyway for maintainability in the more significant cases.

In either case I've never seen any reason why it's not clear when to free() memory, unless you somehow made it behave like a relational database with lots of concurrent areas of code somehow accessing it at the same time in unrelated ways. Typically though I'd think that you'd have some reason to remove an object; and at that time, destroy all resources such as threads that drive the object, and free the object. I guess it's possible if you searched an object out and are working on it in a separate thread, but I can't think of a practical application for this.

At any rate the blog wasn't on programming symantics like explicit memory management; it was on security. I just felt like going off on a tangent to ponder the quandary of why people have trouble with memory management. Pehraps a light weight reference counting library would help; I still have an aversion to garbage collectors because I worry that they may wander the heap back and forth (this is how Boehm was described to me) and thus in times of high memory usage could cause swap thrashing if used on a large scale.

Thursday, October 06, 2005

Security in a language?

Lately I've taken more notice into the debates over programming languages. People often claim that Java is inherently more secure than C; C is faster than Java; C++ is easier than C; C++ is slow and has an over-bloated syntax that makes it confusing; or any number of other things about languages. Looking at C and Java, I'd like to make a quick point.

In gcc 4.1, a re-implementation of ProPolice is included to help squelch stack smashes. OpenBSD has a new secure heap manager that does a similar job in the heap. Then there's PaX with strict but light-weight memory protections; as well as GrSecurity, a project that aims to be a complete security solution built on top of PaX.

Just a few basic enhancements that bring a lot with them. On top of a typical system, ProPolice and the secure heap manager both not only stop security attacks, but report enough specific debugging information to almost trivialize finding and fixing the bugs. PaX stops remote code injection and ret2libc cold, knocking off the basic building blocks of these attacks. GrSecurity finishes up with a few interesting restrictions, including some extreme information separation in /proc and an enhancement to prevent /tmp races, as well as a full mandatory access control system like the more familiar SELinux.

The results are nice, to say the least. C and C++ programs are immunized against stack smashes and heap overflows, as well as code injection and out-of-order execution in general. This alone cuts out over half of the security bugs caused in these languages based on frequency, according to some analysis of the first 60 security announcements from Ubuntu Linux. This includes stack and heap buffer overflows, integer overflows, and most other memory corruptions. And the cost of all this? Around a percent or two, no more, of increase in CPU load.

What of Java or Mono? Well to start with, these programs run on top of a JIT or JVM typically. This demands that the strict data-code separation in PaX is disabled, resulting in a slightly weakened security model. On top of it, Java platforms assume that Java arrays can't be overflowed or double-freed, because the language is bounds checked and garbage collected; however, there is always the slight possibility that in the many tens or hundreds of thousands of lines of added code mimicking the functions of the operating system, a slight mistake can lead to the possibility of Java byte code that forces a double-free or internal overflow, leading to code injection.

C also carries with it a platform, a very thin one though, the "C runtime" or "C standard library." Although small, it presents the same issues as the Java platform; it's just fractionally worrisome because of its smaller size, and inherent security issues are better understood at this point. C++, Objective-C, and other languages build on top of the C runtime and create the problem of an expanded runtime again, though not to the epic proportions of full platforms like Java or Mono. In addition, these runtimes can still be protected by the same enhancements that protect C, although some may have other unexpected attack vectors.

There's a lack of convenience in full platform systems that hasn't been discussed, and has little to do with security. Java and Mono both isolate the program from the underlying system; because of this, they need bindings or reimplementations of common libraries. For example, there must be a class that supplies either a Java implementation of Ogg Vorbis or binds to the native libvorbisfile.so or libvorbisfile.dll to use Ogg Vorbis for Java applications to use Ogg Vorbis. This is mildly aggravating; but also grows the unprotected code base, and forces C libraries used via bindings to run without some security enhancements in JIT and JVM implementations.

There's one more effect that the security enhancements have on C and C++ programs. The increased restrictions tend to expose bugs rather spectacularly; once in a while a program will go from "acting weird sometimes" to simply hard-crashing when introduced into a secure environment. Not only does this force programmers to fix these bugs; but with the stack and heap protections, it even helps them along the way. In essence, the difficulty of debugging a C program can even be reduced by these; though not necessarily below the cost of debugging or writing up a Java or C# program, unless the needed libraries aren't available on those platforms.

So now, which language is more secure? I still say C, for no other reason than because it's easier to make the system protect itself from broken C programs than broken JVMs or Java applets. Assuming the JVM itself is perfect, however, I'm willing to say that C and Java are about on equal ground.