IT Anawer: Apr 14, 2011

Thursday, April 14, 2011

CHM file from XSD files

Is it possible to create a CHM file or HTML help (akin to those created by Sandcastle) from an XSD file? Text in the xs:documentation nodes contains, well, the documentation.

An example snippet from one of the XSD files I have is

<xs:element name="Request" type="RequestType">
 <xs:annotation>
  <xs:documentation>
   <html:p>The Request message contains a number of <html:i>RequestType</html:i> elements for the server to process.</html:p>
   <html:p>A <html:i>Request</html:i> will always result in a <html:i>Response</html:i> message being returned by the server, and <html:b>must</html:b> contain an <html:b>xmlns=[<html:i>Default namespace</html:i>]</html:b> declaration.</html:p>
  </xs:documentation>
 </xs:annotation>
</xs:element>
<xs:element name="Response" type="ResponseType">
 <xs:annotation>
  <xs:documentation>The Response message contains the result of a previous <html:i>Request</html:i> message, with one <html:i>ResponseType</html:i> element for each <html:i>RequestType</html:i> sent to the server.</xs:documentation>
 </xs:annotation>
</xs:element>

From stackoverflow

There's a XSLT called "xs3p" which can be downloaded from xml.fiforms.org/xs3p.

You can use any XSLT processor to convert your XSD into HTML - I use "nxslt3" by Oleg Tkachenko.

A second step would then be to combine several HTML into a CHM using some Html Help builder.

Marc

PS: forgot to mention - both tools are free, of course :-)
XML Schema Documenter is a Sandcastle Help File Builder plug-in that allows you to integrate reference documentation for XML schemas in your help files.

xsddoc.codeplex.com

Rails: Using form (collection select) to call show-action.

A model named 'book' with attributes 'name' and 'id' is given. How can i use this collection select to call the show-action of a certain book? The one code mentioned below returns the following error message:

Couldn't find Book with ID=book_id

<% form_tag(book_path(:book_id)), :method => :get  do %>
  <p>
  <%= label(:book, :id, 'Show Book:') %>
    <%= @books = Books.find(:all, :order => :name)
      collection_select(:book, :id, @books, :id, :name) 
  %> 
  </p>
  <p>
  <%= submit_tag 'Go' %>
  </p>
<% end %>

From stackoverflow

book_path is generated once only, for the form tag itself. It won't be updated whenever your selection changes.

When you submit that form, it's going to request the following URL:
```
/books/book_id?book[id]=5
```
Since your book_path thinks book_id is the ID number you wanted, it tries to look that up. You could do what you want you by changing the code in your controller from:
```
@book = Book.find(params[:id])
```
to:
```
@book = Book.find(params[:book][:id])
```
But it kind of smells bad so be warned.

Javier : Thanks a lot Luke. Any suggestion for a non-smelly solution? I just need a drop-down menu to select a certain book to show...

Javier : Because the "smelly-solution" would break default-calls to the show action using book_path(id), wouldn't it?

Luke : You could try to select the ID from either field, like so: @book = Book.find(params[:book].try(:[],:id) || params[:id]) That would work, but the smell still persists because you're not technically supposed to use RESTful routes like that. Balance 'getting it done' with 'following conventions'.

How to resolve: WPF access keys intercepted by IE

We have built a XBAP application and have given the access keys (keyboard shortcuts) on labels/buttons. But IE seems to capture the access key first and hence the application does not get a chance to handle the access keys.

Is there any workaround/solution so that XBAP application overrides the access key of IE?

From stackoverflow

It seems that there are some access keys that IE will always catch first:

From this link:

"Unfortunately the reality is not all IE keyboard shortcuts can be intercepted, or even received by .xbaps. This is because IE gets first crack at input coming to its window, regardless of who has focus. If it chooses not to propagate a message, we can't do anything about it. "

Estimating relative CPU usage during compilation...

While compiling this morning I had a thgought.

Given a dedicated Linux machine (running Fedora for example), users remotely log in and compile (using gcc) their c++ software, which is stored on their own machines (on a small LAN), linked with symbolic links, to the Linux box.

Assume that each user is compiling exaclty the same code for now... One user can compile and link his code in 10 minutes.

Will it take 2 users 20 minutes in total to compile at the same time what about 3, or 10 users?

Is there an overhead involved that gives diminishing returns as users increase?

As a bonus question - What tips do you have for increasing compiling efficiency in this setup?

From stackoverflow

Depending on the size of the source for the projects, a saving might be to copy all files locally to the build machine before compiling. If the compiler has to pull all files over the network as it needs them this will introduce some overhead as network access is a lot lot slower than disk access.

If you wrote a script or used a tool that would only copy modified files to the build machine, then the overhead would be reduced significantly. In this case, the build machine would basically keep a local mirror of the source files and each time you compile, it would update any modified files, then compile. Obviously, if you have lots of users, and/or large projects files, then you run into storage / space issues.
There is always an overhead involved due to:
- scheduling needs
- time conflicting I/O operations
The last one will be the most important one for you, as network access is severely slower than for example disk access. Pre-caching (first fetch all files locally, then start compilation) may help here. Already started builds will then not be hindered by new concurrent users.

Krakkos : So a build which was pre-cahced then started will not be affected by a new build connecting, precaching then starting concurrently?

ypnos : It will not be severely affected by the new build's precaching. It will, however, be affected by CPU usage of the new build. The idea is that CPU usage doesn't drop because of IO wait.
I suggest distcc.

Lars Wirzenius : also ccache may be useful.

MrEvil : I'm using distcc3 for compilation. Until my new dev workstation arrives I'm working out of a full screen VM on my windows box. In order to speed things up I installed distcc on the two P4 Dell Optiplexes under my desk that otherwise sit idle most of the time and I'm very happy with the results.
Compilation is mostly CPU limited, so assuming you have enough RAM, you can expect the amount of compilation time to be roughly (time per task) * (number of tasks) / (number of CPUs/Cores in the system). (Curiously I ran 'make -j' on a 3 core system on a project of mine and had greater than 3x speed up, so there maybe some kind of blocking issues that prevents sequential make from running at full speed.)

Why don't the users compile their programs on their own computers?

Krakkos : we write multiplatform code, so we build locally using windows, and only compile on the target linux machine when we each have a build ready to deploy/test

split string logic in J2ME

I am developing a J2ME application.

I want to split the following string at "<br>" & comma:

3,toothpaste,2<br>4,toothbrush,3

How can I do this?

From stackoverflow

  private String[] split(String original,String separator) {
    Vector nodes = new Vector();
    // Parse nodes into vector
    int index = original.indexOf(separator);
    while(index >= 0) {
        nodes.addElement( original.substring(0, index) );
        original = original.substring(index+separator.length());
        index = original.indexOf(separator);
    }
    // Get the last node
    nodes.addElement( original );

     // Create split string array
    String[] result = new String[ nodes.size() ];
    if( nodes.size() > 0 ) {
        for(int loop = 0; loop < nodes.size(); loop++)
        {
            result[loop] = (String)nodes.elementAt(loop);
            System.out.println(result[loop]);
        }

    }
   return result;
}

The above method will let you split a string about the passed separator, much like J2EE's String.split(). So first split the string on the line break tag, and then do it at each offset of the returned array for the "," comma. e.g.

 String[] lines = this.split(myString,"<br>");
 for(int i = 0; i < lines.length; i++) 
 {
      String[] splitStr = this.split(lines[i],",");
      System.out.println(splitStr[0] + " " + splitStr[1] + " " + splitStr[2]);     
 }

Refactoring and Test Driven Development

I'm Currently reading two excellent books "Working Effectively with Legacy Code" and "Clean Code".

They are making me think about the way I write and work with code in completely new ways but one theme that is common among them is test driven development and the idea of smothering everything with tests and having tests in place before you make a change or implement a new piece of functionality.

This has led to two questions:

Question 1: If I am working with legacy code. According to the books I should put tests in place to ensure I'm not breaking anything. Consider that I have a method 500 lines long. I would assume I'll have a set of equivalent testing methods to test that method. When I split this function up, do I create new tests for each new method/class that results?

According to "Clean Code" any test that takes longer than 1/10th of a second is a test that takes too long. Trying to test a 500 long line legacy method that goes to databases and does god knows what else could well take longer than 1/10th of a second. While I understand you need to break dependencies what I'm having trouble with is the initial test creation.

Question 2: What happens when the code is re-factored so much that structurally it no longer resembles the original code (new parameters added/removed to methods etc). It would follow that the tests will need re-factoring also? In that case you could potentially altering the functionality of the system while the allowing the tests to keep passing? Is re-factoring tests an appropriate thing to do in this circumstance?

While its ok to plod on with assumptions I was wondering whether there are any thoughts/suggestions on such matters from a collective experience.

From stackoverflow

The 0.1s unit test run time is fairly silly. There's no reason unit tests shouldn't use a network socket, read a large file or other hefty operations if they have to. Yes it's nice if the tests run quickly so you can get on with the main job of writing the application but it's much nicer to end up with the best result at the end and if that means running a unit test that takes 10s then that's what I'd do.

If you're going to refactor the key is to spend as much time as you need to understand the code you are refactoring. One good way of doing that would be to write a few unit tests for it. As you grasp what certain blocks of code are doing you could refactor it and then it's good practice to write tests for each of your new methods as you go.
1. - Yes, create new tests for new methods.
  - I'd see the 1/10 of a second as a goal you should strive for. A slower test is still much better than no test.
2. Try not to change the code and the test at the same time. Always take small steps.
JeffH : A slower test may be fine unless it is so slow that people don't run it often enough.
1. Thats the deal with working with legacy code. And legacy meaning a system with no tests, and with tighly coupled. When adding tests for that code, you are effectively adding integration tests. When you refactor and add the more specific test methods that avoid the network calls, etc those would be your unit tests. You want to keep both, just have then separate, that way most of your unit tests will run that fast.
2. You do that in really small steps. You actually switch continually between tests and code, and you are correct, if you change a signature (small step) related tests need to be updated.
Also check my "update 2" on http://stackoverflow.com/questions/589603/how-can-i-improve-my-junit-tests/589620#589620. It isn't about legacy code and dealing with the coupling it already has, but on how you go about writing logic + tests where external systems are involved i.e. databases, emails, etc.
Here's my take on it:
1. No and yes. First things first is to have a unit test that checks the output of that 500 line method. And then that's only when you begin thinking of splitting it up. Ideally the process will go like this:
  - Write a test for the original legacy 500-line behemoth
  - Figure out, marking first with comments, what blocks of code you could extract from that method
  - Write a test for each block of code. All will fail.
  - Extract the blocks one by one. Concentrate on getting all the methods go green one at a time.
  - Rinse and repeat until you've finished the whole thing
  After this long process you will realize that it might make sense that some methods be moved elsewhere, or are repetitive and several can be reduced to a single function; this is how you know that you succeeded. Edit tests accordingly.
2. Go ahead and refactor, but as soon as you need to change signatures make the changes in your test first before you make the change in your actual code. That way you make sure that you're still making the correct assertions given the change in method signature.
When you've got a lengthy legacy method that does X (and maybe Y and Z because of its size), the real trick is not breaking the app by 'fixing' it. The tests on the legacy app have preconditions and postconditions and so you've got to really know those before you go breaking it up. The tests help to facilitate that. As soon as you break that method into two or more new methods, obviously you need to know the pre/post states for each of those and so tests for those 'keep you honest' and let you sleep better at night.

I don't tend to worry too much about the 1/10th of a second assertion. Rather, the goal when I'm writing unit tests is to cover all my bases. Obviously, if a test takes a long time, it might be because what is being tested is simply way too much code doing way too much.

The bottom line is that you definitely don't want to take what is presumably a working system and 'fix' it to the point that it works sometimes and fails under certain conditions. That's where the tests can help. Each of them expects the world to be in one state at the beginning of the test and a new state at the end. Only you can know if those two states are correct. All the tests can 'pass' and the app can still be wrong.

Anytime the code gets changed, the tests will possibly change and new ones will likely need to be added to address changes made to the production code. Those tests work with the current code - doesn't matter if the parameters needed to change, there are still pre/post conditions that have to be met. It isn't enough, obviously, to just break up the code into smaller chunks. The 'analyst' in you has to be able to understand the system you are building - that's job one.

Working with legacy code can be a real chore depending on the 'mess' you start with. I really find that knowing what you've got and what it is supposed to do (and whether it actually does it at step 0 before you start refactoring it) is key to a successful refactoring of the code. One goal, I think, is that I ought to be able to toss out the old stuff, stick my new code in its place and have it work as advertised (or better). Depending on the language it was written in, the assumptions made by the original author(s) and the ability to encapsulate functionality into containable chunks, it can be a real trick.

Best of luck!
Question 1: "When I split this function up, do I create new tests for each new method/class that results?"

As always the real answer is it depends. If it is appropriate, it may be simpler when refactoring some gigantic monolithic methods into smaller methods that handle different component parts to set your new methods to private/protected and leave your existing API intact in order to continue to use your existing unit tests. If you need to test your newly split off methods, sometimes it is advantageous to just mark them as package private so that your unit testing classes can get at them but other classes cannot.

Question 2: "What happens when the code is re-factored so much that structurally it no longer resembles the original code?"

My first piece of advice here is that you need to get a good IDE and have a good knowledge of regular expressions - try to do as much of your refactoring using automated tools as possible. This can help save time if you are cautious enough not to introduce new problems. As you said, you have to change your unit tests - but if you used good OOP principals with the (you did right?), then it shouldn't be so painful.

Overall, it is important to ask yourself with regards to the refactor do the benefits outweigh the costs? Am I just fiddling around with architectures and designs? Am I doing a refactor in order to understand the code and is it really needed? I would consult a coworker who is familiar with the code base for their opinion on the cost/benefits of your current task.

Also remember that the theoretical ideal you read in books needs to be balanced with real world business needs and time schedules.

Protect pictures, how to?

I am using asp.net and my users need to upload private pictures to my server. This pictures cannot in anyway get spread so I need to protect them in some way. What is the easiest way to protect them from public use so only the authorized user can reach them?

Thanks!

From stackoverflow

Use an asp.net handler instead of serving the images directly. This way you can have a granular control over authorization when serving the image.

Also, keep the images out of public folders, so users can't download them i.e. store them outside the folder for the web site or on a database.

Check this for a bit more info on handlers: http://www.wrox.com/WileyCDA/Section/id-291916.html. Both samples serve images, but they are focused on pretty specific scenarios. As you can see in those, you have complete control over the logic you implement in there, so you could check if the user requesting the image is authorized to download the specified image.
Handler, I'm a newbie.. Whats that?

f3lix : Look for "ASP.Net handler" or IHttpHandler on the net. That's what Freddy is referring to. http://www.agileprogrammer.com/dotnetguy/articles/IntroToHandlers.aspx

Canavar : Welcome martin, please use comments to ask for details.

Burkhard : For comments he needs at least 50 Rep.

eglasius : @martin added a link with 2 image handler implementations - they handle specific scenarios, but you can understand the concept from there and roll your own.

thomasrutter : @Burkhard I think anyone can leave comments to answers on their own question. At any rate, Martin, creating an 'answer' should not be used to ask for further info. Perhaps edit your original question?
- Would you like the public to be able to view the images, but make it a tiny bit harder to download them?
  
  If so, you could look into the way Flickr does it, for anybody that opts out of allowing downloads. They lay a transparent GIF image over the top of the real image, to prevent downloading the image by right-clicking it.
  
  It is still pretty easy to download them, because as a rule of thumb anything the public can view, they can save to their hard disk. I therefore see attempts to prevent downloads of publicly viewable material as fairly futile; and mostly just a violation of usability. Perhaps you should think about legal avenues rather than obfuscation; state your copyright notice and any license you want clearly and be prepared to pursue anyone who steals them.
- Would you like to allow people to view and download images from your site, but not to hotlink them from other sites?
  
  If so, the key is to detect the referer (sic) header sent, and deny the image if the referer is not a match. Note that if the referer is blank, you have to trust it by default, as a lot of people's browsers legitmately don't send a referer even when viewing on your own page.
  
  This is usually done in a server directive; if you were using Apache, you would do it in an .htaccess file using mod_rewrite directives. If you are on IIS, however, then I'm less clear, though these instructions may or may not help.
- Or, do you want to prevent the public from being able to view them at all? If so, you would just need to use access control on whatever server you are using - here's access control instructions for IIS.
eglasius : @thomasrutter I agree that you can't do much to prevent an authorized user from saving the image and spreading it outside your system. This doesn't mean you are free to let anyone unauthorized to download the user's images directly from the system.

thomasrutter : @Freddy Rios, if blocking unauthorized people is what is desired, then that's an access control issue - you need to make it so unauthorized people cannot access the pictures at all. But to allow people to view them, there's no physical way to prevent them saving or copying them, only legal ways.

eglasius : @thomasrutter agreed, re-read the question :) --- I think it is exactly about that (authorization), note that you can perfectly authorize images on a per user/group basis
Hi again!

Thanks for your answers, what I want to do is make sure that just the user that uploaded the photos will be able to reach them. The user that uploaded the photos should be able to download them or whatever he wants. Can I just put the photos in a directory over the actual website, will this be enough so noone can browse the photos?
You might want to look at this question for some ideas: secure images against static requests

SharePoint Webpart deserialize error

I am getting a random web part error, it works one refresh and then not the next:

Web Part Error: One of the properties of the web part has an incorrect format. Windows SharePoint services cannot deserialize the Web Part. Check the format of the properties and try again.

The web parts have been on the site for a long time, and I have checked Micorsoft Support, http://support.microsoft.com/kb/826786 . And it is not a permision error becuase it has been this way for a long time. The only thing changed in regards to webparts was going into Site settings > Web Parts > New and selected some webparts that were not in the list and I think I also Checked the ones that are having this random error and clicked "Populae Gallery". Any body have a clue?

From stackoverflow

This could also be because of insufficient disk space on the server where SharePoint is hosted. http://www.sharepointblogs.com/vipul/archive/2006/09/25/webpart-error-due-to-space-crunch.aspx Check the available disk space.

Hope this helps.
hmm I have 27 gb of disk space remaining so i dont think its the problem.
but try doing this:

Go the central administration

operations service accounts Web application pool (radio button) webservice (drop down); select windows sharepoint services web app application pool (second drop down); select the site that the problems occurs on e.g. home_port_80 user name (text field): type in the account info you have used during sharepoint post installation configuration password: at last type in the password.

hope it helps

PropertyChanged for indexer property

I have a class with an indexer property, with a string key:

public class IndexerProvider {
    public object this[string key] {
        get
        {
            return ...
        }
        set
        {
            ...
        }
    }

    ...
}

I bind to an instance of this class in WPF, using indexer notation:

<TextBox Text="{Binding [IndexerKeyThingy]}">

That works fine, but I want to raise a PropertyChanged event when one of the indexer values changes. I tried raising it with a property name of "[keyname]" (i.e. including [] around the name of the key), but that doesn't seem to work. I don't get binding errors in my output window whatsoever.

I can't use CollectionChangedEvent, because the index is not integer based. And technically, the object isn't a collection anyway.

Can I do this, and so, how?

From stackoverflow

According to this blog entry, you have to use "Item[]". Item being the name of the property generated by the compiler when using an indexer.

If you want to be explicit, you can decorate the indexer property with an IndexerName attribute.

That would make the code look like:
```
public class IndexerProvider : INotifyPropertyChanged {

    [IndexerName ("Item")]
    public object this [string key] {
        get {
            return ...;
        }
        set {
            ... = value;
            FirePropertyChanged ("Item[]");
        }
    }
}
```
At least it makes the intent more clear. I don't suggest you change the indexer name though, if your buddy found the string "Item[]" hard coded, it probably means that WPF would not be able to deal with a different indexer name.

Inferis : That works great. Strange that I missed that blogpost in my Google searches.
All I can say is WOW this works like a champ. thanks so much!

Dykam : Is this an anwer? No. Comment comments, answer answers.
Actually, I believe setting the IndexerName attribute to "Item" is redundant. The IndexerName attribute is specifically designed to rename an index, if you want to give it's collection item a different name. So your code could look something like this:
```
public class IndexerProvider : INotifyPropertyChanged {

    [IndexerName("myIndexItem")]
    public object this [string key] {
        get {
            return ...;
        }
        set {
            ... = value;
            FirePropertyChanged ("myIndexItem[]");
        }
    }
}
```
Once you set the indexer name to whatever you want, you can then use it in the FirePropertyChanged event.

Converting a PDF to a series of images with Python

I'm attempting to use Python to convert a multi-page PDF into a series of JPEGs. I can split the PDF up into individual pages easily enough with available tools, but I haven't been able to find anything that can covert PDFs to images.

PIL does not work, as it can't read PDFs. The two options I've found are using either GhostScript or ImageMagick through the shell. This is not a viable option for me, since this program needs to be cross-platform, and I can't be sure either of those programs will be available on the machines it will be installed and used on.

Are there any Python libraries out there that can do this?

From stackoverflow

ImageMagick has Python bindings.

Coderer : I'd like to kick in that yeah, you can just include the ImageMagick library with your project -- just make sure you review the license terms so you can put the appropriate stuff in your readme
You can't avoid the Ghostscript dependency. Even Imagemagick relies on Ghostscript for its PDF reading functions. The reason for this is the complexity of the PDF format: a PDF doesn't just contain bitmap information, but mostly vector shapes, transparencies etc. Furthermore it is quite complex to figure out which of these objects appear on which page.

So the correct rendering of a PDF Page is clearly out of scope for a pure Python library.

The good news is that Ghostscript is pre-installed on many windows and Linux systems, because it is also needed by all those PDF Printers (except Adobe Acrobat).
If you're using linux some versions come with a command line utility called 'pdftopbm' out of the box. Check out netpbm

How to prevent threads from dying or just quitting in C# .NET?

I am starting atmost 5 threads in a program in C# .NET. But sometimes some of those threads just quit or become dead inexplicably even before the completion of execution of the function assigned to it.

It happens randomly . If I try to debug the code by putting breakpoints- It works fine.

And sometimes all the threads execute the assigned functions perfectly.They do not share any resources among them.

            Thread[] td = new Thread[5];
            for (count = 4; count >= 0; --count)
            {
                ds[count] = dba.getData(ru[count]);
                td[count] = new Thread(delegate() {  runRule[count].performTask(ru[count], ds[count], count); });
                td[count].Name = "Thread " + count.ToString();
                td[count].Start();
                Thread.Sleep(50);
            }

If I remove the last line "Thread.Sleep(50)" only the first thread stared runs and rest of them just die.

can someone explain why the threads are becoming dead?

From stackoverflow

If a thread reaches the end of its execution path it will automatically end. In your exmaple once the thread has finished whatever the performTask method is it will exit.

If you want a thread to hang around forever (for debugging purposes only as you don't want unused threads hanging around in a real application) you could add something like the below to the end of the method that the thread is running.
```
ManualResetEvent stayAlive = new ManualResetEvent(false);
stayAlive.WaitOne();
```
LoyalBanana : Thanks for ur reply,I don't want the threads to hang around forever, the prob here is that they are not completing their full execution path.They are just dying or quitting before executing all the code.
I suspect they're not becoming dead - I suspect the problem is that you're not actually running the rules you think you are. When you use a method's local variables within an anonymous method, the variables themselves are captured. In this case, you're capturing the count local variable - and then changing it (as the loop counter decreases). By the time a thread created when count=4 starts running, count may be 3 - so it will be calling runRule[3].performTask(ru[3], ds[3], 3). In fact, count could change while the expressions are being evaluated which could cause a lot of fun.

The way to get round this is to have a different "local" variable for each iteration of the loop. It's easy to do:
```
Thread[] td = new Thread[5];
for (count = 4; count >= 0; --count)
{
    int copy = count;
    ds[count] = dba.getData(ru[count]);
    td[count] = new Thread(delegate() {  
        runRule[copy].performTask(ru[copy], ds[copy], copy); 
    });
    td[count].Name = "Thread " + count.ToString();
    td[count].Start();
    Thread.Sleep(50);
}
```
Now the only variables being captured in the delegate are copy and runRule/ru/ds - I'm assuming that the latter three don't change. A new "instance" of the copy variable is created each time you go round the loop, so changes won't interfere with each other.

See whether this helps - it's at least a potential cause for massive confusion, and may well be the problem.

LoyalBanana : ThankyouForUrReplyThe prob u mentioned's avoided usin Thread.Sleep(50).Cause when the main thread waits for 50 millisecs, threads are created before the next iteration.That's not the main prob here,I have 5 prgresbrs to show how much progress is made.The threads just quit after showing some progress

What is the best method/options for expiring records within a database?

In a lot of databases I seem to be working on these days I can't just delete a record for any number of reasons, including so later on they can be displayed later (say a product that no longer exists) or just keeping a history of what was.

So my question is how best to expire the record.

I have often added a date_expired column which is datetime field. Generally I query either where date_expired = 0 or date_expired = 0 OR date_expired > NOW() depending if the data is going to be expired in the future. Similar to this, I have also added a field call expired_flag. When this is set to true/1, the record is considered expired. This is the probably the easiest method, although you need to remember to include the expire clause anytime you only want the current items.

Another method I have seen is moving the record to an archive table, but this can get quite messy when there are a large number of tables that require history tables. It also makes the retrieval of the value (say country) more difficult as you have to first do a left join (for example) and then do a second query to find the actual value (or redo the query with a modified left join).

Another option, which I haven't seen done nor have I fully attempted myself is to have a table that contains either all of the data from all of the expired records or some form of it--some kind of history table. In this case, retrieval would be even more difficult as you would need to search possibly a massive table and then parse the data.

Are there other solutions or modifications of these that are better?

I am using MySQL (with PHP), so I don't know if other databases have better methods to deal with this issue.

From stackoverflow

I think adding the date_expired column is the easiest and least invasive method. As long as your INSERTS and SELECTS use explicit column lists (they should be if they're not) then there is no impact to your existing CRUD operations. Add an index on the date_expired column and developers can add it as a property to any classes or logic that depend on the data in the existing table. All in all the best value for the effort. I agree that the other methods (i.e. archive tables) are troublesome at best, by comparison.
I prefer the date expired field method. However, sometimes it is useful to have two dates, both initial date, and date expired. Because if data can expire, it is often useful to know when it was active, and that means also knowing when it started existing.

Darryl Hein : Yes, quite useful in a case such as a product table or taxes.
I usually don't like database triggers, since they can lead to strange "behind the scenes" behavior, but putting a trigger on delete to insert the about-to-be-deleted data into a history table might be an option.

In my experience, we usually just use an "Active" bit, or a "DateExpired" datetime like you mentioned. That works pretty well, and is really easy to deal with and query.

There's a related post here that offers a few other options. Maybe the CDC option?

http://stackoverflow.com/questions/349524/sql-server-history-table-populate-through-sp-or-trigger
A very nice approach by Oracle to this problem is partitions. I don't think MySQL have something similar though.
May I also suggest adding a "Status" column that matches an enumerated type in the code you're using. Drop an index on the column and you'll be able to very easily and efficiently narrow down your returned data via your where clauses.

Some possible enumerated values to use, depending on your needs:
1. Active
2. Deleted
3. Suspended
4. InUse (Sort of a pseudo-locking mechanism)
Set the column up as an tinyint (that's SQL Server...not sure of the MySQL equivalent). You can also setup a matching lookup table with the key/value pairs and a foreign key constraint between the tables if you wish.
I like the expired_flag option over the date_expired option, if query speed is important to you.
There are some fields that my tables usually have: creation_date, last_modification, last_modifier (fk to user), is_active (boolean or number, depending on the database).

Darryl Hein : I usued to do this, but got tired of it and have instead use a separate table where I insert every query (other than selects) which gives me a complete history and whereas last modified and who can be pretty much useless in most cases.

Sam : Great idea, I have to say. Another option would be to use the auditing options of databases instead of manually keeping track of changes, but yours is good: simple and effective.
I've always used the ValidFrom, ValidTo approach where each table has these two additional fields. If ValidTo Is Null or > Now() then you know you have a valid record. In this way you can also add data to the table before it's live.
Look at the "Slowly Changing Dimension" SCD algorithms. There are several choices from the Data Warehousing world that apply here.

None is "best" -- each responds to different requirements.

Here's a tidy summary.

Type 1: The new record replaces the original record. No trace of the old record exists.
- Type 4 is a variation on this moves the history to another table.
Type 2: A new record is added into the customer dimension table. To distinguish, a "valid date range" pair of columns in required. It helps to have a "this record is current" flag.

Type 3: The original record is modified to reflect the change.
- In this case, there are columns for one or more previous values of the columns likely to change. This has an obvious limitation because it's bound to a specific number of columns. However, it is often used on conjunction with other types.
You can read more about this if you search for "Slowly Changing Dimension".

http://en.wikipedia.org/wiki/Slowly_Changing_Dimension

how to make element in a Xml schema as optinal

I have one element

this one i need to make it as optional. Is there any other way than making minoccurs = "0"

Thanks in advance,

From stackoverflow

No, that is the proper way to do it. Is there some reason why you wish to avoid minOccurs="0"?

Uday : No. I thought there might be any other possibility.
minOccurs="0" is the way to specify optional

Best way to rearrange an ArrayList in Java

What is the best way to rearrange elements in an list? I need the ability to move elements to move elements in the list, one step back or forward in the index. I was thinking of getting the index of the item, adding it at index -1 / +2 and removing the old reference.

Is there a faster way to handle rearranging without creating duplicates in the list in the process.

From stackoverflow

Use the JDK's swap method

The JDK's Collections class contains a method just for this purpose called Collections.swap. According to the API documentation this method allows you to "swap the elements at the specified positions in the specified list."

I suggest this solution so that you don't have to remove elements from the List and so that you don't have to roll your own swap method. Also, it looks like this method has been around since the 1.4 release of Java so it should work for most of the modern JDKs.

Jon Skeet : Doh. Didn't even see that. Deleting my answer...

Elijah : I only remembered it because I was looking at it today for my project. It is in one of the dark corners of the JDK.

Jeff Olson : Very cool, I don't think I've ever used that method either, but it definitely could come in handy.

NHibernate O/R Question

I have written a lot of ejb, oledb and ADO code over the years. My experience with O/R mappers is at best they are pigs for speed, at worst a bug filled nightmare.

Is NHibernate or Spring .NET worth the trouble and why?

From stackoverflow

They are worth the "trouble" because they let you clearly define your data in a model that is separated from behavior and other code. You don't need to write a custom DAL. They help with separation of concerns.
It depends on your focus, I guess.

In my shop, some people love 'em (the real Java/OO propellor heads) and some people despise them (with more SQL skills).

I'm in the 2nd group but I'm a developer DBA.

They're not evil, per se, but simply a useful tool if used correctly without any religous baggage.
If you don't mind a tight coupling to SQL Server and a relatively tight coupling to the schema, you might consider looking into LINQ to SQL. It's part of the framework, pretty straightforward, and amazingly easy to program against. You can be up and running and coding against it in minutes using the designer that comes with VS 2008.

That said, LINQ to SQL isn't the be-all-end-all; NHibernate is certainly more robust in many ways, and there's also the whole Entity Framework migration going on.

But LINQ to SQL is not a bear for speed. And I can tell you from experience that although the designer isn't exactly bug-free it is pretty robust and the underlying fundamentals are not buggy in the least.

LINQ to SQL is actually quite a powerful and mature ORM.

If you need a working example of the power of LINQ to SQL, take a look at this very website. It uses LINQ to SQL for the back-end, and I don't think it's slow at all.
I have not used the Spring.Net Data module, so I wont comment on that. I have used Spring.Net for it's IOC. In general, I think you can do better for that one.

NHibernate is very good. It will be slower than straight up ADO.Net, but in general, not enough for you to worry about. The key is that NHibernate allows you to get the database code up and running quickly, so you can worry about the actual application code. You application is more than the database.

Then when you find a query that is taking to long, and affecting application performance, rewrite THAT method using the traditional approach.

There are other alternatives as well, Entity Framework, SubSonic, LLBLGen, etc.

bernhardrusch : "You application is more than the database." Good quote - this is the point many DBA type people are missing IMO.

gbn : Maybe... but it's not an excuse to misuse or ignore them...

Chris Brandsma : There is no excuse for not knowing good relational design, and you should have some idea of how to index. After that, I don't care. The database is a tool.
As Matt mentioned ORMs are valuable for the separation of concerns they allow in your application design. The big tradeoff of course is raw performance. The philosophy I suppose is that you can always scale your hardware resources, but you can't easily scale and extend tightly coupled code.

If you're looking in the .net space, give some consideration to Lightspeed. It's a commercial ORM, but significantly faster than NHibernate, and generally more intuitive to work with.
NHibernate is definately worth the effort. As others have said, it is slower than directly using ADO.Net etc, but the increase in productivity is worth it I think. On the performance front, there are quite a few tweaks you can do in NHibernate to speed things up:
1. Lazy Loading - only load child objects/collections from the DB when they are actually requested
2. Caching - NHibernate has 1st and 2nd level caching. 1st level cache is used by default and means NHibernate won't run identical queries multiple times within the same session. 2nd level cache is down to the user, where you can choose which operations are cacheable (by a 3rd party caching engine), which NHibernate will handle for you. E.g. if you load an object many times over (across all sessions), and the object changes rarely, making it cacheable to speed things up.

How do you decide what assemblies to split your project into?

What's the most important consideration when deciding how to partition your application into assemblies? Do some people just create one assembly per unique namespace (or perhaps per root namespace) for convenience? Or one per app-layer (like Presentation, Business/Services, Data)? Or more subtle perhaps, put the entire Model in one assembly? Does any of this really matter?

Does having too many assemblies slow things down, is there a critical mass or a "good" number of assemblies that an application should have? Likewise, is there a tipping point for when a single assembly is too big, and does a large assembly also affect performance?

I know it depends on the specific app of course - so I'm mainly interested in general guidelines, and what criteria to use when deciding.

Thanks!

(Although in my particular case, if anyone wants to comment on it, I'm building a WCF service with a business and DAL layer underneath, and a web site to consume the service. I've traditionally created lots of smaller assemblies, but now I'm thinking the simplicity of "Web", "Service", "Model" and maybe "Data" (for repositories etc) looks pretty appealing. Web references Service only, Service references Model only, and Model references Data only. Not sure how much it matters.)

From stackoverflow

To me, the main reasons why I would isolate some code into a dll are :
- Functionality sharing between several projects
- Easy delivery of new versions of this functionality without updating others
- Interface sharing between a client and a server for example
If you need one of those things, you should consider making a dll

On the downside, you get some tricky cyclical dependencies to solve (usually done using callbacks)
One per layer, one for any "top to bottom" app-specific things (like the model, probably) and one for "common utilities" seems reasonable to me. One benefit of splitting between layers is that then internal types and methods etc will only be visible within their layer, which can be a nice aid to encapsulation.

I've seen various applications with far too many projects - it becomes unwieldy very quickly, particularly when you then double up the projects with unit tests.
I tend to go for the one/layer which is vertical split, once the code base starts to grow I then look at splitting out horizontally, normally oriented around specific chunks of business functionality. For instance if my persistance layer is talking to > 20 or so tables, I'd be looking to split that down, knowing where to split is a bit of a black art, it's highly domain and project specific.

The rule of thumb I use is, what reasonable set of classes would I be likely to re-use elsewhere. Having just been through this process on my own code base, it's interesting how much other stuff then becomes clear candidates for re-factoring to base, abstract, or interface classes, and they in turn are pushed into a lower level generic assembly.

Is Dataset an ORM?

I am a little bit confused about Dataset compared to ORM (NHibernate or Spring.Net). From my understanding the ORM sits between the application layer and the database layer. It will generate the SQL commands for the application layer. Is this the same as what Dataset does? What is the difference between the Dataset and ORM? What are the advantages and disadvantages for these two methods? Hope the experts in here can explain something.

Thanks, Fakhrul

From stackoverflow

- ADO.NET DataSet =
  
  http://msdn.microsoft.com/en-us/library/zb0sdh0b(VS.80).aspx
- ORM =
  
  http://en.wikipedia.org/wiki/Object-relational_mapping (Example Developer Express XPO,DataObjects.NET)
ORM is based on mapping between objects and tables. Not the case for this dataset. Dataset is itself in a way directly to the table. ORM is based on a minimum of SQL script. But enough to use the dataset you write SQL clause. Dataset in this case is not an ORM.

Look at dataset and ORM.
the Dataset class is definitly not an ORM; an ORM maps relational data with an object oriented representation.

It can be regarded as some kind of 'unit of work' though, since it keeps track of the rows that have to be deleted/updated/inserted.
No, Datasets are not ORM's. They may look like orms because datasets map tables to objects just like ORM's the main difference lies in what objects they map to.

Datasets have their own table and row object types that closely resemble the structure of the database. You're rebuilding part of the database's relational model in objects. Restricting these objects into something resembling a relational database gets around some of the problems inherent in mapping a database to an object model.

An ORM maps the tables and rows from the database into your own object model. The structure of your object model can be optimized for your application instead of resembling a relational database. The ORM takes care of the difficulties in transforming a relational model into an object model.
There is a BIG difference between them, first of all about the programming model they represent:
1. The Dataset is based on a Table Model
2. An ORM (without specify a particular product of framework) is based and tends to a Domain Model.
3. There is another kind of tool which could be used in data scenario, this kind of tool is a Data Mapper (eg. iBatis.NET)
As others answers before me, I think it's important to view what Microsoft says about Dataset and better what Wikipedia says about ORM, but I think (this was for me at beginning) it's more to understand the difference between them in terms of model. Understanding that will not only clarify the choises behind but better, will do too easy to approach and understand a tool itself.

As little explanation it's possible to say:

Table Model

is a model which tends to represent tabular data in a memory structure as close as possible (and even as needed). So it's easy to find implementations which implements concepts as Table, Columns, Relations in fact the model is concetrate on the table structure, so object orientation is based on that not on data itself. This model could has it's own advantages, but in some case could be heavy to manage and difficult to apply concepts on contained data. As previous answers says, implementations like Dataset, let, or better, force you to prepare (even if with a tool) needed SQL instrunctions to perform actions over the data.

ORM

is a model which (as mendelt says before me..) where Objects are mapped directly to database objects, principally Tables and Views (even if it's possible to map even functions and procedures too). This is done in 2 ways generally, with a mapping file which describes the mapping, or with (in case of .NET or Java) code Attributes. This model is based on Objects which represents the data, so object orientation could be done on them as in normal programs, it's clear with more attention and caution in certain cases, but generally, when you are confident with ORM it could be a really powerfull tool! Even ORM could be heavy to manage if it's not managed and designed well, or better understood weel, so it's important to understand techniques, but I can say with my experience that ORM is a really powerfull tool. In ORM, the tool principally it's responsible to generate the SQL instructions needed as operations are done in code, and in more cases ORMs has a middle language (like HQL) to perform operations on Objects.

MAPPER

A mapper is a tool which doesn't makes things like an ORM, but, maps hand written SQL instructions to an Object Model. Thi kind of tool could be a better solution when it's needed to write by hand SQL instructions but It's wanted to designe an application Object model to represent data. In this "model" objects are mapped to instruction and described in a mapping file (generally an Xml file as iBatis.Net or iBATIS (java) does). A mapper let you define granular rules in SQL instructions. In this scenario could be easy to find some ORM concepts as for example session management.

ORM and Mappers let to apply some very interesting Design Patterns, which could be not so easy to apply in the same way to a Table Model and in this case to a Dataset.

First of all excuse me for this long answer and about my poor english, but for me, an answer like this makes me in past to understand well the difference between this models and then between implementations.
DataSet is a DTO, a data transfer object. DataSet itself can't do anything. You can use a DataAdapter (of the provider used) to produce sql or call predefined queries, though it still isn't doing anything.

Why em instead of px?

I heard you should define sizes and distances in your stylesheet with em instead of in pixels. So the question is why should I use em instead of px when defining styles in css? Is there a good example that illustrates this?

From stackoverflow

It's of use for everything which has to scale according to the font size.

It's especially useful on browsers which implement zooming by scaling the font size. So if you size all your elements using em they scale accordingly.
use px for precise placement of graphical elements. use em for measurements having to do positioning and spacing around text elements like line-height etc. px is pixel accurate, em can change dynamically with the font in use
Because the em (http://en.wikipedia.org/wiki/Em_(typography)) is directly proportional to the font size currently in use. If the font size is, say, 16 points, one em is 16 points. If your font size is 16 pixels (note: not the same as points), one em is 16 pixels.

This leads to two (related) things:
1. it's easy to keep proportions, if you choose to edit your font sizes in your CSS later on.
2. Many browsers support custom font sizes, overriding your CSS. If you design everything in pixels, your layout might break in these cases. But, if you use ems, these overridings should mitigate these problems.
I have a small laptop with a high resolution and have to run Firefox in 120% text zoom to be able to read without squinting.

Many sites have problems with this. The layout becomes all garbled, text in buttons is cut in half or disappears entirely. Even stackoverflow.com suffers from it:

Note how the top buttons and the page tabs overlap. If they would have used em units instead of px, there would not have been a problem.

thomasrutter : You can also zoom text in Firefox without zooming images in View -> Zoom -> Zoom Text Only, then zooming as normal.

strager : I have absolutely no issues zooming in SO with Opera, except image scaing being terrible. Can you provide a screenshot or something?

flodin : @strager: good idea

flodin : Err, why was I downvoted?

Spoike : +1 good point. although regarding large resolutions, you could set your screen setting to show text at 120 dpi instead of the standard 96 resolution

flodin : @Spoike: I would, but Firefox does not honor the system DPI setting. See https://bugs.launchpad.net/ubuntu/+source/firefox/+bug/19524

thomasrutter : I've seen worse - at least the text stayed with the buttons/tabs. I've seen sites where the buttons go one way and the text over them go a different way. Still, they haven't allowed enough space for that resolution and font size particulrly around the top tabs.
It is wrong to say that one is a better choice than the other (or both wouldn't have been given their own purpose in the spec). It may even be worth noting that StackOverflow makes extensive use of px units. It is not the poor choice Spoike was told it was.

px relates to pixel size on-screen. If, for example, you are writing a print style sheet the argument could be made that you shouldn't use px units.

em should be used when you want to define something relative to the size of characters in the current font. Unless you have overridden font style (using px units for example), this will be affected by the choice of fonts in the user's browser or OS if they have made one, so it does not make sense to use em as a general unit of length unless you specifically want it to scale as the font size scales, which may often be the case.

px should be used when you want to define something relative to the size of a pixel on-screen. This can be useful when you want something to be on the same scale as a 1:1 image in the page, for example. If you have a 16x16px icon, you'll probably want to state its size in pixels (unless, I guess, in a print style sheet) and position its background or label using pixels so that they do not distort relative to the image when the font size or browser zoom changes (if the browser zoom scales images, it will scale px units equally).

% values can be useful when you want a length relative to a dimension of the parent element. They are a good alternative to px units for things like the total width of a design, if your design does not rely on specific pixel sizes to set its size.

Prashant : What to use for "width" property in CSS em or px? and why?

Prashant : In case I want to convert a PX stylesheet to em based then 1px = how many em's? I tried out this tool: http://riddle.pl/emcalc/ but when I am converting 990px to em, it returns 61.88em, but when I am adding this to stylesheet then my content area becomes small, than 990px

thomasrutter : There is no way to convert between ems and pixels, unless you know what the size of an 'em' is in pixels, in that context. That can depend on the inherited font size of that element, which can in turn depend on the font size of the document as a whole, which can depend on the font size settings in the user's browser and/or operating system. If you are aiming for a particular pixel size, then specify it in pixels. Ie, if you want 990px, then put 990px. If pixels is what you want, why not use them?
A very practical reason is that IE 6 doesn't let you resize the font if it's specified using px, whereas it does if you use a relative unit such as em or percentages. Not allowing the user to resize the font is very bad for accessibility. Although it's in decline, there are still a lot of IE 6 users out there.

xk0der : My web-developer friends wish IE6 never existed! :)

John Topley : At lot of us feel like that, but to be fair it was a very good browser when it first came out. It's just a shame Microsoft forgot about it for all those years!

mercator : Note that IE7 and IE8 *still* can't resize text with a font size specified in pixels (through `Page > Text Size`). However, IE7 added page zoom (`Page > Zoom`) which zooms the entire page, including the text, obviously.

bitcrazed : Better than Page > Zoon, try hitting [CTRL] + [-] or [+] keys on your keyboard in IE7+, Chrome and Firefox - full page zooming without most of the issues that one experiences trying to change font sizes.
The main reason for using em or percentages is to allow the user to change the text size without breaking the design. If you design with fonts specified in px, they do not change size (in IE 6 and others) if the user chooses text size - larger. This is very bad for users with visual handicaps.

For several examples of and articles on designs like this (there are a myriad to choose from), see the latest issue of A List Apart: Fluid Grids, the older article How to Size Text in CSS or Dan Cederholm's Bulletproof Web Design.

Your images should still be displayed with px sizes, but, in general, it is not considered good form to size your text with px.

As much as I personally despise IE6, it is currently the only browser approved for the bulk of the users in our Fortune 200 company.

Spoike : +1 fluid grids was actually what I was looking for when I asked the question. (also, shouldn't your company be upgrading to IE7?)

Traingamer : If you asked me, they should use Firefox, but they don't actually ask me. :-) (IE7 would be better than 6, but I'm not privy to the decision-making)
You probably want to use em for font sizes until IE6 is gone (from your site). Px will be alright when page zooming (as opposed to text zooming) becomes the standard behaviour.

Traingamer already provided the neccessary links.
The general consensus is to use percentages for font sizing, because it's more consistent across browsers/platforms.

It's funny though, I always used to use pt for font sizing and I assumed all sites used that. You don't normally use px sizes in other apps (eg Word). I guess it's because they're for printing - but the size is the same in a web browser as in Word...
The reason I asked this question was because I forgot how to use em's as it was a while I was hacking happily in CSS. People didn't notice that I kept the question general as I wasn't talking about sizing fonts per se. I was more interested how to define styles on any given block element on the page.

As Henrik Paul and others pointed out em is proportional to the font-size used in the element. It's a common practice to define sizes on block elements in px however sizing up fonts in browsers usually breaks this design. Resizing fonts is commonly done with the shortcut keys Ctrl++ or Ctrl+-. So a good practice is to use em's instead.

Using px to define width

Here is an illustrating example. Say we have a div-tag that we want to turn into a stylish date box, we may have html-code that looks like this:
```
<div class="date-box">
    <p class="month">July</p>
    <p class="day">4</p>
</div>
```
A simple implementation would defining the width of the date-box class in px:
```
* { margin: 0; padding: 0; }

p.month { font-size: 10pt; }

p.day { font-size: 24pt; font-weight: bold; }

div.date-box {
    background-color: #DD2222;
    font-family: Arial, sans-serif;
    color: white;
    width: 50px;
}
```
The problem

However if we want to size the text up in our browser the design will break. The text will also bleed outside the box which is almost the same what happens with SO's design as flodin points out. This is because the box will remain the same size in width as it is locked to 50px.

Using em instead

A smarter way is to define the width in ems instead:
```
div.date-box {
    background-color: #DD2222;
    font-family: Arial, sans-serif;
    color: white;
    width: 2.5em;
}

* { margin: 0; padding: 0; font-size: 10pt; }

// Initial width of date-box = 10 pt x 2.5 em = 25 pt
// Will also work if you used px instead of pt
```
That way you have a fluid design on the date-box, i.e. the box will size up together with the text in proportion to the font-size defined for the date-box. In this example the font-size is defined in * as 10pt and will size up 2.5 times to that font size. So when you're sizing the fonts in the browser, the box will have 2.5 times the size of that font-size.

Thursday, April 14, 2011

Table Model

ORM

MAPPER

Using px to define width

The problem

Using em instead

Blog Archive