Wednesday, May 29, 2013

Trouble in paradise

Ran into a major snag last night. The graphic engine I plan to use seems to have some bugs. I didn't mind bugs in the demos and I really didn't mind that the C translator and gcc compiler are so slow. I could live with all of that. But I really didn't want to have to fix somebody else's graphics code. How do I know it's not me? Just a few little indicators.

The interpreted code works but the code from the demo I copied over seemed to have a callback function in it. Not just that, but the function had a control handle passed into it. No problem there, but it didn't use that control handle. Instead it provided it's own from the surrounding code. Apparently an attempt to bypass a scope problem rather than code it properly.

So when I compile and run the code I get a warning about the function being out of scope. Being only a warning suggests it never actually calls that function. However the debugging tools are rather primitive so today I have to figure out how to get to the call stack so I can find out where the problem originates. Assuming it really is in scope in the interpreter but out of scope after being compiled. Did I mention the code output is different after compile? After more than a decade of working with Visual Basic, rather than this new Euphoria, I've never run across that kind of problem. Anything the interpreter could handle would always produce the same results in VB after a compile.

Now my take on scope. Again it's over complicated by 'computer scientists.' You only need three levels. Beyond that you introduce complications that don't add any real 'functionality' to your ability to write good code. The opposite in fact.

First you've got global scope. This is the default scope of many bad programmers. It's the reason you have scoping rules in the first place. Parameters with this scope should be rare, but it does have its uses so parameters don't have to be deeply passed which produces a different kind of problem.

Second is file level scope which actually has two variations, public and private. Private is in scope anywhere in the file. Public is visible as an external interface when the file is a class and is a global when it's not a class. This is not complicated although I may have needed to say it better. Declaration is simple too. A private routine, procedure or function (all the same thing in this imagined language) is a sub. If public, it's a pub.

Pub RoutineName() or Sub RoutineName(). Changing just one letter changes the scope which hunt and peck programmers should approve of.

Third is routine level scope which can't be seen outside the routine.

It's not uncommon for an instance of a class to have global scope. For example a class representing an open database connection is often global and simplifies your code.

Scope is for the purpose of reducing problems. More scoping rules than these introduces problems. Function declared inside other functions ironically provide no new functionality. Nor does it simplify code as some would assert. Simple code is very readable. Functions should be atomic. Functions within functions are not.

Update: Ok, I should say more about functions within functions. The argument is those functions are only used within the functions they are defined in. Which is another scope rule. But you can accomplish the same thing by putting all those functions along with their parent in a separate file without any new scope rule. All the functions then are private except the parent which is public.

Update: Euphoria has two variable types: Atom and Sequence. I don't know about version 4 as I haven't gotten into that too far as yet. My language would only have one (no type declaration required.) Assuming 32 bits I would use one bit to identify it as either a literal 31 bit signed integer you could pass by value or an address [offset or perhaps index] which would be a pass by reference. A call doesn't have to specify pass by reference or value since the one bit takes care of that. Every call would pass exactly 32 bits or nothing. Every return would be either 32 bits or nothing. Do you see how this reduces the load on the programmer? They have one less thing to think about, never having to ask themselves, "is this a function or procedure?" The function declarations would still have a list of parameters to pass but this list is only for the [humble] programmers use and they just always bundle it all up into a single sequence. They don't even have to do that, the function call can bundle it up for them, but only one 32 bit value is ever passed. No stack pushing or popping ever (at least for parameter passing.) Optional parameters require no special handling (but I can see the programmer abuse already) other than a default value in the declaration. Any pass by value should always include a debug statement in the function to verify that it isn't a reference variable. Sequence values could continue to use Euphoria type debugging.

Reference variables would have 7 bits for type and 56 bits for byte length. While perhaps difficult to implement the resulting code would be simple to understand. It would have the same strength as Euphoria in that memory allocation and deallocation is automatic with nothing required from the programmer. Integers for example could be any size up to +/-2e+447 (which should be big enough for most requirements) all handled under the covers so the programmer doesn't have to give it a thought. Strings could be 2e+56 bytes long (duh.) Using just 24 bits for length instead would still give you strings up to 16,777,216 bytes in length and +/-2e+191 for integers. Ok, we can just use 32 bits for type and length. 64 is overkill. It's good to have these real issues for an imaginary language resolved, isn't it?

Update: One of the issues I didn't understand years ago with Euphoria was how passed values were handled. I assumed a sequence could be updated in a function and since it was passed by reference the caller would automatically have the updated sequence. It doesn't work that way. You have to assign the new values to the callers sequence to get the changes. This means passing a single 32 bit value into a function really does provide savings verses pushing and popping onto a stack since all passed values have local scope in the function meaning an assignment is being made regardless. You are never really dealing with the address the passed values are pointing at if you do anything more than a read. For readability you would probably always assign indexed values to separate variables in the function (guided by the function declaration which is right there) especially if you use those values more than once in the function.

Almost forgot; the passed variable doesn't have a name if there's more than one parameter, does it? We can fix that. Just use the declaration names as an enumeration for the index. Then just use arg(param1), arg(param2), etc. for the individual values (which themselves could either be a value or reference type.) Note that the enumerated values take up no address space in the compiled code since it's just a compiler issue.

GTK Docs

No comments: