Friday, April 15, 2011

algorithm to find best combination

Assume that I have a list of 100 products, each of which has a price. Each one also has a energy (kJ) measurement.

Would it be possible to find the best combination of 15 products for under $10 of which the sum of the energy (kJ) was greatest, using programming?

I know C#, but any language is fine. Cheers.

Update: Having a bit of troble finding some sample source code for the knapsack issue. Does anyone have any or know where to find some. Been googling for a few hours and need to get this sorted by tomorrow if possible. Ta.

From stackoverflow
  • http://en.wikipedia.org/wiki/Knapsack_problem

  • That sounds a lot like the knapsack problem. There are various approaches (order descending by energy density, for example).

  • This reminds me of the famous knapsack algorithm

    http://en.wikipedia.org/wiki/Knapsack_problem

  • This sounds more like a linear programming problem.

    Informally, linear programming determines the way to achieve the best outcome (such as maximum profit or lowest cost) in a given mathematical model and given some list of requirements represented as linear equations.

    Check out the Simplex Method.

    Lieven : +1 Spot on. Check out lp_Solve. It is opensource, does what you need and has examples for a myriad of languages, including C#.
    Mitch Wheat : @Lieven: Thx, and here's the link: http://lpsolve.sourceforge.net/5.5/
    Schotime : Won't this just give me the average price that the 15 products should be, not all the different prices which add to $10.
  • This in Integer Linear Programming, optimizing a linear equation subject to linear constraints, where all the variables and coefficients are integers.

    You want variables includeItem1, ..., includeItemN with constraints 0 ≤ includeItem*i* ≤ 1 for all values of i, and includeItem1 + ... + includeItemN ≤ 15, and includeItem1*priceItem1 + ... ≤ 10, maximizing includeItem1*kilojouleItem1 + ....

    Stick that in your favorite integer linear program solver and get the solution :)

    See also http://en.wikipedia.org/wiki/Linear_programming

    It doesn't make sense to say that your particular problem is NP-complete, but it's an instance of an NP-complete (kind of) problem, so there might not be a theoretically fast way of doing it. Depending on how close to optimality you want to get and how fast ILP solvers work, it might be feasible in practice.

    I don't think you problem is a special case of ILP that makes it particularly easy to solve. Viewing it as a knapsack-like problem, you could restrict yourself to looking at all the subsets of 1..100 which have at most (or exactly) 15 elements, which is polynomial in n---it's n-choose-15, which is less than (n^15)/(15!), but that's not terribly useful when n = 100.

    If you want recommendations for solver programs, I have tried glpk and found it pleasant to use. In case you want something that costs money, my lecturer always talked about CPLEX as an example.

    Schotime : Thanks Jonas. Could you go through the setup a bit more with the variables and constraints. I've never done this before so any extra help would be greatly appreciated
    Jonas Kölker : You have three variables per item: one is {0, 1} and indicates whether the variable is included in your set; the second is the price of the item, the third is the energy. You then want to maximize the linear combination of "included" times "energy", subject to upper bounds on two other linr combns
    Jonas Kölker : s/variable/item/ at "is included in your set"
  • This is the knapsack problem if you can either pick a product or not. If you can pick fractional values of products then you could solve this with the simplex method, but the fractional knapsack problem has a simple solution.

    Order the items by energy/price ratio, picking 100% of the highest ones until you run out of money, then pick a fractional value of the highest remaining one.

    For example if prices are 4,3,5,4 and energies are 3,5,2,7 the ordering is

    7/4, 5/3, 3/4, 2/5

    So you would pick items 4 and 2 which would cost 7$, with the 3$ remaining you would buy 75% of the first item for a price of $3 and an energy of 3*.75 = 2.25

    This would give a total energy of 14.25

    Note that allowing fractional values will give you a higher objective value than only allowing 0% or 100%, so no integer solution will do any better than 14.25 (or 14 for that matter since the objective value has to be an integer).

    To solve the original knapsack problem, you can use a branch-and-bound which should work just fine in practice. Assume that you have a current candidate solution with an objective value of z*

    1. Solve the relaxed problem, where you allow the fractional weights. If the value is less than z*, discard this branch.
    2. Compute a new value z which is the solution found without the last fractional weight, if this value is greater than z* , replace z* with your new value
    3. Pick one item (say the first in the list, the most profitable one) and form two subproblems, one where you must include it in the solution and one where you cannot include it in the solution (this is the branching step).
    4. As long as you have subproblems left to solve, pick one and go back to step 1.

    Note that when you create the subproblem where you must pick an item, just subtract its price from the budget and add the value to the profit, now you have a smaller problem to solve.

    For a more detailed description look at Branch and Bound on Wikipedia.

    Schotime : Thanks Pall but you can't add a fractional product. You either add it or you don't.
    Pall Melsted : It's still useful to "pretend" you could do it, since it gives you an upper bound on how well you could do. The branch and bound always keeps track of the best integer solution, it uses the fractional solution to reject branches that can't give you better than what you already have.
  • The Stony Brook Algorithm Repository lists implementations for the knapsack problem.

    Their book The Algorithm Design Manual has this kind of information for a vast array of problems.

  • It should be possible to solve the problem with Cream for Java. There's also a version for C# available CSharpCream.

  • Yes, as everyone pointed out this is a complex knapsack problem. Something this simple might be good enough though...

    SELECT TOP 15 *
    FROM Product
    WHERE Price < 10
    ORDER BY Energy DESC
    

Regex not returning 2 groups

Hello everyone,

I'm having a bit of trouble with my regex and was wondering if anyone could please shed some light on what to do.

Basically, I have this Regex:

\[(link='\d+') (type='\w+')](.*|)\[/link]

For example, when I pass it the string:

[link='8' type='gig']Blur[/link] are playing [link='19' type='venue']Hyde Park[/link]" 

It only returns a single match from the opening [link] tag to the last [/link] tag.

I'm just wondering if anyone could please help me with what to put in my (.*|) section to only select one [link][/link] section at a time.

Thanks!

From stackoverflow
  • Regular Expressions Info a is a fantastic site. This page gives an example of dealing with html tags. There's also an Eclipse plugin that lets you develop expressions and see the matching in realtime.

  • You need to make the wildcard selection ungreedy with the "?" operator. I make it:

    /\[(link='\d+')\s+(type='\w+')\](.*?)\[\/link\]/
    

    of course this all falls down for any kind of nesting, in which case the language is no longer regular and regexs aren't suitable - find a parser

    annakata : I had to change some other aspects of the regex for it to make sense to my ecmascript brain...
    fishkopter : Thanks alot! works perfectly!
    Tomalak : @annakata: I think this question would have been a reasonable candidate for the "regexhtmlparserquestions" tag you once put up. ;-)
    annakata : sigh, I do miss that tag :)
    Tomalak : There is still one question that has it. You can still go for the Taxonomist badge. :-)
  • You need to make the .* in the middle of your regex non-greedy. Look up the syntax and/or flag for non-greedy mode in your flavor of regular expressions.

Reporting Services 2005: ReportExecution2005.asmx returns with 401 Access Denied when called from a RenderingExtension

Hi,

I've got a rendering extension for reporting services which uses the ReportExecution2005.asmx service to execute a number of "subreports" and then puts the results in a powerpoint presentation.

A typical usage scenario would be to go to the Report Manager, select my "Powerpoint" report, which is used only as a placeholder for parameters to be passed to the "subreports". I then select my extension from the list of export formats and click Export, which runs the extension and it gives me back my pptx file.

This works fine on both our live and test servers. But I've run into a very weird problem trying to set up another test server.

Any call made by the extension to the webservice returns with "401 Access Denied" (no further substatus information available).

Things I've tried without success: - allow physical access to the folder structure of the ReportServer virtual directory to everyone. And I mean literally everyone - ASPNET, NETWORK SERVICE, Everyone, the account I usually use to log in, which is an admin and owner of the folder - use reporting manager to set up the security on the reporting services side. Again, every conceivable user/group account which could be involved was given every conceivable role. - allow anonymous access to the ReportServer web app. - enabled impersonation on the ReportServer web app. - hardcoded the user credentials to use when calling the webservice (by default it just uses System.Net.CredentialCache.DefaultCredentials)

I also made a little test website which consumes the ReportExecution service in exactly the same way as the RenderingExtension, and from the website, the web server authenticates fine and allows me to call the web service, so obviously the problem doesn't lie with the web service security setup itself.

Starting to pull my hair out. As a last resort, I'm about to reinstal Reporting Services, but in the mean time (or if that doesn't help) - if anyone has encountered something similar, I'd greatly appreciate any input.

From stackoverflow
  • Hi,

    I have seen similar behavior when migrating an ASP.NET project which called the ReportExecution2005 web service. When running on my local box, everything was Ok. When I migrated it to our development server, I started getting 401 errors. In our case, Kerberos needed to be configured on the development server to delegate credentials to the SSRS box.

    You might want to verify that the new server will delegate it's credentials to the SSRS machine and see if that makes any difference.

    Hope this helps,

    Bill

Constructor chaining with "this"

Why does the first constructor in ClassA cause the compiler error 'cannot use "this" in member intializer'?

... or how can i get this to work?

Thanks

public sealed class ClassA : IMethodA
{
    private readonly IMethodA _methodA;

    public ClassA():this(this)
    {}

    public ClassA(IMethodA methodA)
    {
        _methodA = methodA;
    }

    public void Run(int i)
    {
        _methodA.MethodA(i);
    }

    public void MethodA(int i)
    {
        Console.WriteLine(i.ToString());
    }
}

public interface IMethodA
{
    void MethodA(int i);
}
From stackoverflow
  • You are allowed to use the this(...) syntax to invoke another constructor at the same level - however, you cannot use this (the current instance) in this context.

    The easiest option here is to duplicate the assignment code (_methodA = methodA).

    Another option might be null-coalescing:

    public ClassA():this(null)
    {}
    
    public ClassA(IMethodA methodA) 
    { // defaults to "this" if null
        _methodA = methodA ?? this;
    }
    
    Bobby Cannon : @Marc Gravell: beat me to it...
  • This is called out in section 10.11.1 of the C# spec

    An instance constructor initializer cannot access the instance being created. Therefore it is a compile-time error to reference this in an argument expression of the constructor initializer, as is it a compile-time error for an argument expression to reference any instance member through a simple-name.

    There is no way to get this to work with an instance constructor because this cannot be accessed. What you could do is make the constructor private, create an initialization method and a static constructor.

    public sealed class ClassA : IMethodA {    
      private ClassA() { }
      private void Initialize(IMethodA param) { ... }
      public static ClassA Create() {
        var v1 = new ClassA();
        v1.Initialize(v1);
        return v1;
      }
      public static ClassA Create(IMethodA param) {
        var v1 = new ClassA();
        v1.Initialize(param);
        return v1;
      }
    }
    
  • You're trying to pass the object before it is constructed. Although the compiler could do something sensible in this case, in general that won't work.

    Your actual example works if you just do this:

       public ClassA()
      {
        _methodA = this; 
      }
    

    But you probably want to share more logic, so just use a function.

      public ClassA()
      {
        SetStuff(); 
        _methodA = this; 
      }
    
      public ClassA(IMethodA methodA)
      {
        SetStuff(); 
        _methodA = methodA;
      }
    
    Marc Gravell : Note that you can still use SetStuff with readonly fields... *if* you use a `ref` / `out` argument. Whether it is worth it depends on the scenario.
  • You can't use the this keyword when chaining constructors essentially because this refers to an object that hasn't been instantiated yet (creation of the object doesn't begin until some (the top-level or base) constructor block has been entered). Moreover, why exactly would you want to do this? It seems rather pointless when you have access to the this keyword everywhere.

    I recommend simply using independent constructors as such:

    public sealed class ClassA : IMethodA
    {
        private readonly IMethodA _methodA;
    
        public ClassA()
        {
            _methodA = this;
        }
    
        public ClassA(IMethodA methodA)
        {
            _methodA = methodA;
        }
    }
    

    Perhaps I misunderstand what you're trying to do, but hopefully that will solve the issue for you.

Regular expression match only if subpattern doesn't match

Hello

I'm trying to match C style comments form a file, but only if the comment don't start with a certain labels introduced by @

For example from

/* some comment to match */
/* another comment.
this should match also */
/*@special shouldn't match*/

Is this possible using regular expressions only?

I'm trying this using JavaScript implementation of regular expressions.

From stackoverflow
  • You could start with something like this:

    /\*[^@]
    

    But in general, you don't watch to match C-style comments with regular expressions, because of nasty corner-cases. Consider:

    "foo\" /* " " */ "
    

    There's no comment in that code (it's a compile-time concatenation of two string literals), but you're not going to have much luck parsing it without a real parser. (Technically, you could use a regular expression, because you only need a simple finite state machine. But it's a very disgusting regular expression.)

    Tomalak : +1 for pointing out the risky part. I don't think you could use a regular expression to successfully parse a C like language, though. Not even with an extremely ugly one.
    emk : Even though you can't parse arbitrary C code with a regex, you can actually strip comments. I've actually written a state machine before to do this before, and any such state machine can be translated into a regex. But I don't think I could construct it by hand without a lot skull sweat.
  • /\*\s*(?!@)(?:(?!\*/).)*\*/
    

    Breaks down as:

    /\*               // "/*"
    \s*               // optional space
    (?!@)             // not followed by "@"
    (?:               // don't capture...
       (?!\*/).       // ...anything that is not "*/"
    )*                // but match it as often as possible
    \*/               // "*/"
    

    Use in "global" and "dotall" mode (e.g. the dot should match new lines as well)

    The usual word of warning: As with all parsing jobs that are executed with regular expressions, this will fail on nested patterns and broken input.

    emk points out a nice example of (otherwise valid) input that will cause this expression to break. This can't be helped, regex is not for parsing. If you are positive that things like this can never occur in your input, a regex might still work for you.

    Ant : Just to be pedantic, \s*(?!@).? doesn't mean what you think it means, but is rather a 0 width negative lookahead. It means that once you have matched as much whitespace as possible (\s*) continue with the match ONLY IF the next character is NOT an @. The .? is unnecessary.
    Tomalak : Just to be pedantic, how do you suppose I could have written a negative look-ahead without knowing what it is? ;-) You are right about the ".?" being unnecessary, though. I removed it.
  • use negative lookahead

SQL backlog calculation (MS Access)

Hi, I need to calculate the backlog from a table: components(ProductId, ComponentId, Status, StatusDate) where ComponentId, Status and StatusDate are the primary key. ProductId is a foreign key. Example:

prod1, comp1, 01, 05/01/2009
prod1, comp1, 02, 05/01/2009
prod1, comp1, 03, 06/01/2009
prod1, comp1, 01, 07/01/2009
prod1, comp1, 02, 20/01/2009
prod2, comp2, 01, 22/01/2009
prod1, comp1, 02, 23/01/2009
prod1, comp1, 03, 31/01/2009

Basically what I am trying to calculate is the number of Components per week in status lower than 03. End user will introduce an interval date so I need to show all the weeks in the interval even if there is not backlog for a week. Expected result when end user introduces 01/01/2009-22/01/2009:

Week, Backlog
1,NULL/0
2,1
3,1
4,2

Explanation for Week 2: comp1 reach status 03 in the week but then goes back to status 01
Any help is more than welcome, thanks

From stackoverflow
  • This is a partial answer in that I do not see where week 3 (11-18 Jan 2009) is coming from in your example. It illustrates the use of a counter table to get a line for missing values.

    SELECT Counter,WeekNo, CountofStatus FROM Counter LEFT JOIN
        (SELECT Format([StatusDate],"ww") AS WeekNo, COUNT(c.Status) AS CountOfStatus
        FROM components c
        WHERE c.StatusDate BETWEEN #1/1/2009# AND #1/22/2009#
        AND c.Status<3
        GROUP BY Format([StatusDate],"ww")) Comp
    ON Val(Comp.Weekno)=Counter.Counter   
    WHERE Counter.Counter>=Val(Format(#1/1/2009#,"ww"))
    AND Counter.Counter<=Val(Format( #1/22/2009#,"ww"))
    
    Remou : So the status is continuous and only marked at date changed? If so, my example does not suit.
    Remou : Your example shows two components, do you have a set number of components and is it large?
  • I'm guessing a bit as to what you're trying to do, but here's my best guess:

    First, you should have a calendar table in your database:

    CREATE TABLE Calendar (
         calendar_date DATETIME NOT NULL,
         week_number INT NOT NULL,
         CONSTRAINT PK_Calendar PRIMARY KEY CLUSTERED (calendar_date)
    )
    GO
    
    INSERT INTO Calendar (calendar_date, week_number) VALUES ('1/1/2009', 1)
    INSERT INTO Calendar (calendar_date, week_number) VALUES ('2/1/2009', 2)
    etc.
    

    You can add additional columns to the table based on your business needs. For example, an "is_holiday" bit column to track whether or not your office is closed that day. This table makes so many different queries trivial.

    Now for your problem:

    SELECT
         CAL.week_number,
         COUNT(DISTINCT C.component_id)
    FROM
         Calendar CAL
    LEFT OUTER JOIN Components C ON
         C.status_date = CAL.calendar_date AND
         C.status IN ('01', '02')
    WHERE
         CAL.calendar_date BETWEEN @start_date AND @end_date
    GROUP BY
         CAL.week_number
    

    I used the IN for the status since you're using strings, so "< '03'" might not always give you what you want. Is '1' less than '03' in your mind?

    Also, if there is a time component on any of your dates the equality and BETWEEN checks might need to be tweaked.

    EDIT: I just saw the comments on the other answer. If you are dealing with just status changes, then the following query should work, although there may be a more performant method:

    SELECT
         CAL.week_number,
         COUNT(DISTINCT C.component_id)
    FROM
         Calendar CAL
    LEFT OUTER JOIN Components C ON
         C.status_date <= CAL.calendar_date AND
         C.status IN ('01', '02')
    LEFT OUTER JOIN Components C2 ON
         C2.component_id = C.component_id AND
         C2.status_date > C.status_date AND
         C2.status_date <= CAL.calendar_date
    WHERE
         CAL.calendar_date BETWEEN @start_date AND @end_date AND
         C2.component_id IS NULL
    GROUP BY
         CAL.week_number
    

    I'm not sure where the product fits in with all of this though.

What's the correct terminology for this design pattern?

I am writing a section of code that allows "soft" forms, such as a configurable questionnaire or checklist. The header table/class just groups together a bunch of questions, where each question has a "Text" property for the question itself, an "AnswerType" enumeration (string/boolean/StronglyAgree-StronglyDisagree) etc., an order property and whatever other little bells and whistles; you get the picture. And actual instances of the questions being answered can likewise be saved in relation to the question set as opposed to having hard-coded columns.

The details are actually unimportant. My question is: what is this design pattern called? What would be an appropriate name for the tables that are storing the soft-coded questions?

I thought of "SoftForm", which probably comes closest to describing the situation, but I've never heard of such a term before, and I'm sure there's a standard term for this design pattern. What is it?

From stackoverflow
  • It seems like a flavor of Model-View-Controller, as applied to forms. Though the view and controller are probably fused together in your particular example, which is why it is hard to see.

    • Model is the data for the form.
    • View is responsible for visualizing the form data to the user.
    • Controller is responsible for handling the users' answers.
  • ISPF called them panels, HTML and XAML calls them forms, Win32 Dialog resources. They all are interpreted, declarative UIs. "declarative UI" is probably the term you want, though what name to use for the DB table is up to you, since you're restricting yourself to a domain specific vocabulary (questionnaires) rather than being general purpose examples.

    If you're instead talking about implementing it, then there are several patterns. Some systems effectively interpret the declarative language by parsing in into an object graph (builder pattern). Other systems take the declarative specification, and generate code from that, and run it, either at run-time or when the configuration changes (some JSPs, a couple of systems I've built). You can use standard UI patterns such as MVC within the implementation of either the generative or the object graph approach.

  • Sounds like a Builder pattern to me.

    Shaul : Nope. Builder pattern has a strongly defined interface. This is completely soft-coded, down to the questions and answers.
    chaos : The form isn't the Builder; the code that generates the form representation from the configuration data is. It has a 'strongly defined interface'. The forms aren't an 'interesting pattern' in themselves; the interesting thing about them is that they're dynamically generated to spec.

How can I find out why my app is slow?

I have a simple Rails app deployed on a 500 MB Slicehost VPN. I'm the only one who uses the app. When I run it on my laptop, it's fast enough. But the deployed version is insanely slow. It take 6 to 10 seconds to load the login screen.

I would like to find out why it's so slow. Is it my code? (Don't think so because it's much faster locally, but maybe.) Is it Slicehost's server being overloaded? Is it the Internet?

Can someone suggest a technique or set of steps I can take to help narrow down the cause of this problem?


Update:

Sorry forgot to mention. I'm running it under CentOS 5 using Phusion Passenger (AKA mod_rails or mod_rack).

From stackoverflow
  • Your CPU time is guaranteed by Slicehost using the Xen virtualization system, so it's not that. Don't have the other answers for you, sorry! Might try 'top' on a console while you're trying to access the page.

  • What is Slicehost using to serve it? Fast options are things like: Mongrel, or apache's mod_rails (also called passenger phusion or something like that) These are dedicated servers (or plugins to servers) which run an instance of your rails app.

    If your host isn't using that, then it's probably defaulting to CGI. Rails comes with a simple CGI script that will serve the page, but it reloads the app for every page.

    (edit: I suspect that this is the most likely case, that your app is running off of the CGI in /webapp_directory/public/dispatch.cgi, which would explain the slowness. This tends to be a default deployment on many hosts, since it doesn't require extra configuration on their part, but it doesn't give good performance)

    If your host supports "Fast CGI", rails supports that too. Fast CGI will open a CGI session, and keep it open for multiple pages, so you get much better performance, but it's not nearly as good as Mongrel or mod_rails.

    Secondly, is it in 'production' or 'development' mode? The easy way to tell is to go to a page in your app that gives an error. If it shows you a stack trace, it's in development mode, which is slower than production mode. Mongrel and mod_rails have startup options to determine whether to run the app in production or development mode.

    Finally, if your database is slow for whatever reason, that will be a big bottleneck as well. If you do have a good deployment (Mongrel/mod_rails/etc.) in production mode, try looking into that.

    Alex JL : Just a note, Slicehost gives you your own dedicated server equivalent, so it's up to you to install and configure the operating system . It's not managed or shared hosting.
  • First, find out if there's a particularly slow response from the server. Use Firefox and the Firebug plugin to see how long each component (including JavaScript and graphics) takes to download. Assuming the main page itself is what is taking all the time, you can start profiling the application. You'll need to find a good profiler, and as I don't actually work in Ruby on Rails, I can't suggest any: google "profile ruby on rails" for some options.

    As YenTheFirst points out, the server software and config you're using may contribute to a slowdown, but A) slicehost doesn't choose that, you do, as Slicehost just provides very raw server "slices" that you can treat as dedicated machines. B) you're unlikely to see a script that runs instantly suddenly take 6 seconds just because it's running as CGI. Something else must be going on. Check how much RAM you're using: have you gone into swap? Is the login slow only the first time it's hit indicating some startup issue, or is it always that slow? Is static content served slow? That'd tend to mean some network issue (either on the Slicehost side, or your local network) is slowing things down, assuming you're not in swap.

    When you say "fast enough" you're being vague: does the laptop version take 1 second to the Slicehost 6? That wouldn't be entirely surprising, if the laptop is decent: after all, the reason slices are cheap is because they're a fraction of a full server. You're using probably 1/32 of an 8 core machine at Slicehost, as opposed to both cores of a modern laptop. The Slicehost cores are quick, but your laptop could be a screamer compared to 1/4 of core. :)

    Alex JL : You'd only have that particular fraction of a core at Slicehost if everyone else was maxing out their CPU time. Slicehost guarantees a certain amount, but will give you more if available. I run a moderately busy server (60-70k hits a day) and my CPU load is usually .7%.
    Paul Kroll : The Duck is correct. Maybe that means he should bill me. (Yes, I know, I'll pay for that comment the rest of my life.) :)
  • If it is just slow on the first time you load it is probably because of passenger killing the process due to inactivity. I don't remember all the details but I do recall reading people who used cron jobs to keep at least one process alive to avoid this lag that can occur with passenger needed to reload the environment.

    Edit: more details here

    Specifically - pool idle time defaults to 2 minutes which means after two minutes of idling passenger would have to reload the environment to serve the next request.

  • Do you have a lot of data in your DB? I would double check that you have indexed all the appropriate columns- because this can make a huge difference. On your local dev system, you probably have a lot more memory than on your 500 mb slice, which would result in the DB running a lot slower if you have big, un indexed tables. You can also run the slow queries logger in MySql to pinpoint columns without indexes.

    Other than that, yes- passenger will need to spool up a process for you if you have not been using the site recently. If this is the case, you should see a significant speed increase on second, and especially third and later page loads.

  • You might want to run a local virtual machine with 500 MB. Are you doing a lot of client-server interaction? Delays over the WAN are significant

  • Try to pint point where the slowness lies

    1/ application is slow, or infrastructure (network + web server)

    • put a static file on your web server, and access it through your browser

    2/ If it is fast, it is probable a problem with application + server configuration.

    • database access is slow
    • try a page with a simpel loop: is it slow?

    3/ If it slow, it is probably your infrastructure. You can check:

    • bad network connection: do a packet capture (with Wireshark for example) and look for retransmissions, duplicate packets, etc.
    • DNS resolution is slow?
    • server is misconfigured?
    • etc.
  • You might want to check out RPM (there's a free "lite" version too) and/or New Relic's Tune Up.

  • If you are using FireFox and doing localhost testing (or maybe even on LAN) you may want to try editing the network.dns.disableIPv6 setting.

    Type about:config in the address bar and filter for network.dns.disableIPv6 and double-click to set to true.

    This bug has been reported mainly from Vista OS's, but some others as well.

  • You could try running 'top' when you SSH in to see which process is heavy. If you also have problems logging you, perhaps you may try getting Statistics in the Slicehost manager.

    If you discover it is MySQL's fault, consider decreasing the number of servers it can spawn.

    512 seems decent for Rails application, you might have to check if you misconfigured too.

xajax and select

I have a bit of simple created with XAJAX, which replaces the innner HTML of a select control with some options created by a php script.

This is fine and dandy in Firefox, but it does not work in IE7.

Looking on the XAJAX forums i found this which basically says " doesnt really work in IE, use a div and replace the inner HTML of that with the full select statement"

Did this, and it's fine, except that i had a jQuery selector working on the select control, which now no longer works.

Anyone got any ideas, or can anyone point me to a good jQuery example of how to do the ajax bit using jQuery, so I can ditch the XAJAX altogether?


EDIT:

<div id=imgselect>
  <select id="images">
   <option value="">Then select an image</option>
  </select>
</div>


Picture Preview:<br><br>

 <div class="img-preview" id='preview'></div>
 <script type='text/javascript'>
   $('#images').change(function() 
   {
     var image = $(this).val();
     var img = $('<img/>').attr('src', image);
    $('#preview').html(img);

    document.getElementById('picsmall').value = image;
    });
</script>

The problem comes when the contents of the imgselect div is replaced by the AJAX call

From stackoverflow

publish a website using msbuild

How can I publish a web app using msbuild through the command line or insdie a batchfile??

From stackoverflow
  • What you're talking about is called a Web Deployment and you can find information about how to do that here:

    http://weblogs.asp.net/scottgu/archive/2008/01/28/vs-2008-web-deployment-project-support-released.aspx

  • I wonder what exactly you're trying to do. Is it just xcopying your files over to the deployment server or does it also include setting up IIS?

    MSBuild has built in tasks for Copy and Building, but does not include IIS tasks. For that I would recommend finding a tasks library like the MSBuild extension pack or the MSBuild community tasks.

    http://www.codeplex.com/MSBuildExtensionPack/

    http://msbuildtasks.tigris.org/

    Gavin Miller : +1 - msbuildtasks is a great library and easy to use
  • If this is a web application project all you need to to is build the sln/project and then copy the published web site files to wherever

    If it is a web site then you can use the ASPNetCompiler task

    <AspNetCompiler
          Clean="true"
          Debug="true"
          VirtualPath="/"
          PhysicalPath="$(BuildDir)\"
          TargetPath="$(BuildDir)\Publish\"
          Updateable="true"
          FixedNames="true"
          Force="true" />
    
    salvationishere : Dean, how do I add a web app project to a website where web app is on same server as website and I'm not trying to do an automated deployment? Just a simple web hosting using IIS6 on a Windows XP system?

Which naming conventions do you use for SVN Branches and Tags?

Our company is creating a naming convention for SVN branches and tags, and I'm not confortable with the idea of using only date or build number on branches/tag names.

I think we need names that brings a greater definition about what this path represents, what effort is being done, etc.

What do you think / use?

From stackoverflow
  • All of our developer tasks go into a bug tracking system. This bug tracking system has IDs associated with each task.

    So for the branch name of any task, we use:

    ticketId_TicketSubject

    When a branch contains multiple ticketIds we just combine them into the branch name:

    ticketId1_ticketId2_Description

    That way if you're in a ticket and you want to know which branch to build, you can easily look it up. Likewise if you want to find the ticket with your branch build, you can easily find it too.

    For tags, we tag it by the version number itself.

    As for the location of each branch. We have a top level hierarchy like this:

    /branches

    /tags

    /trunk

    Then all of our products/projects go under each of those inside their own subfolders.

    /trunk/project1/

    /branches/project1/TicketId_Description

  • For a feature branch, name it after what is being done. For example, I moved our ORM from LINQ to SQL to NHibernate and created a branch called "NHibernate". Once you have completed the branch and merged it back into the trunk you can delete the branch to save naming conflicts in the future. If you really need to retrieve the branch you can, you just have to delve back into the history and restore it.

    If you have story/quote/job numbers that are relevant to a branch, I would append it to the name of the branch eg. "NHibernate_429" so you can easily reference it against your tracking system. However, I would always go with the English first as that's what people are more realistically going to refer to it by when it is in development.

    For things like tags it's hard to say what you want to do as it depends on what it is you are tagging. If you are tagging releases then I would use "Release X.X.X.X" or something simple like that. You really aren't going to care what the date or build number was when you are looking back for a specific release as an example.

  • What we use (mostly following the accepted convention):

    projectName
     |
     --trunk
     |
     --tags
     |
     --branches
    

    Under trunk we have the main trunk.

    Under tags we tag every release (both internal, testing releases and customer releases). There we just use the version number as the tag name.

    Under branches we have one branch for every major version we released (in our case the result of one XP iteration). Those are named like the major version ("v5.03", "v6.04"). Additionally, we have internal branches for major changes or special versions. There naming is free-form, and the name is supposed to tell people what the branch represents. Expamples would be "workaround_customerA", "module_x_reorg" etc.

  • We give our branches a ".X" version, where the tags have a number.

    For example, the branch would be Foo-1.2.3.X, and the tags would be Foo-1.2.3.1, Foo-1.2.3.2, etc.

    We also have a special tag, Foo-1.2.3.0, which is made as soon as the branch is created from trunk, before any changes. That's so we can diff branches and tags to the initial state at any time (since in a couple days, trunk is likely to be different). This practice has made merges a little easier, and it makes figuring out what code changed in the branch a whole lot easier.

  • I always prefix the tags (and usually branches too) with the date in YYYYMMDD format, followed by a description of the purpose of the tag or branch.

    e.g., 20090326_Release_v6.5 or 20090326_Post_Production_Update

    This is under the standard trunk/tags/branches hierarchy of course.

    The date prefix ensures that they all the tags or branches are displayed in creation order, which is much more useful then just being sorted by description if your scanning through a big folder of tags. You see the timeline of when and why they were created (like mini log messages).

  • Well, branching is pretty much open, since there's several different kinds of branches, the naming of them can be very different.

    It's worth remembering what source control gives you. Tag names are not just "v1.4", it's "/CashCowProject/tags/v1.4". Naming a tag "/CashCowProject/tags/CashCowProject-v1.4" is a little redundant, what else would it be?

    Revision control also gives you full access to dates and times that tags were created. Revision control also gives you commit messages which you should be making use of, particularly the first line.

    Given all this information, it's not difficult to pull together a simple view giving you all the information you need, coming from consistent and appropriate sources, such as:

    CashCowProject
        v1.4 - 26 March 2009     :  With Added whizzbang (more)
        v1.3 - 13 February 2009  :  Best graphics!       (more)
        v1.2 - 01 January 2009   :  Upgraded security    (more)
    

    The only thing that the tag name is really useful for here is the version number. If you're trying to put all the information into a tag name, it's a little wordy and I bet it won't look as good.

    Jim T : That said, I still name my tags "Project-4.8"
    Victor Rodrigues : I think this thought works well for tags, but not necessarily for branches. I can open an experimental branch, and a number will tell me nothing about what this branch is about.
    Jim T : Indeed, I'm completely leaving branch naming out of this answer, and a number would not be a good branch name.

What's the difference between dict() and {}?

So let's say I wanna make a dictionary. We'll call it d. But there are multiple ways to initialize a dictionary in Python! For example, I could do this:

d = {'hash': 'bang', 'slash': 'dot'}

Or I could do this:

d = dict(hash='bang', slash='dot')

Or this, curiously:

d = dict({'hash': 'bang', 'slash': 'dot'})

Or this:

d = dict([['hash', 'bang'], ['slash', 'dot']])

And a whole other multitude of ways with the dict() function. So obviously one of the things dict() provides is flexibility in syntax and initialization. But that's not what I'm asking about.

Say I were to make d just an empty dictionary. What goes on behind the scenes of the Python interpreter when I do d = {} versus d = dict()? Is it simply two ways to do the same thing? Does using {} have the additional call of dict()? Does one have (even negligible) more overhead than the other? While the question is really completely unimportant, it's a curiosity I would love to have answered.

From stackoverflow
  • >>> def f():
    ...     return {'a' : 1, 'b' : 2}
    ... 
    >>> def g():
    ...     return dict(a=1, b=2)
    ... 
    >>> g()
    {'a': 1, 'b': 2}
    >>> f()
    {'a': 1, 'b': 2}
    >>> import dis
    >>> dis.dis(f)
      2           0 BUILD_MAP                0
                  3 DUP_TOP             
                  4 LOAD_CONST               1 ('a')
                  7 LOAD_CONST               2 (1)
                 10 ROT_THREE           
                 11 STORE_SUBSCR        
                 12 DUP_TOP             
                 13 LOAD_CONST               3 ('b')
                 16 LOAD_CONST               4 (2)
                 19 ROT_THREE           
                 20 STORE_SUBSCR        
                 21 RETURN_VALUE        
    >>> dis.dis(g)
      2           0 LOAD_GLOBAL              0 (dict)
                  3 LOAD_CONST               1 ('a')
                  6 LOAD_CONST               2 (1)
                  9 LOAD_CONST               3 ('b')
                 12 LOAD_CONST               4 (2)
                 15 CALL_FUNCTION          512
                 18 RETURN_VALUE
    

    dict() is apparently some C built-in. A really smart or dedicated person (not me) could look at the interpreter source and tell you more. I just wanted to show off dis.dis. :)

    Emil H : Whoa. That's really cool. Thanks for sharing. :)
    verix : Thanks for showing off dis! That's a fun-looking module. I'll have to look into it more in-depth later on.
    David : +1 for showing off dis.dis :)
    Steven Huwig : cmon, one more vote for dis.dis :)
  • As far as performance goes:

    >>> from timeit import timeit
    >>> timeit("a = {'a': 1, 'b': 2}")
    0.424...
    >>> timeit("a = dict(a = 1, b = 2)")
    0.889...
    
    John Fouhy : I get similar result for empty dictionary: 0.1usec for 'x = {}' vs 0.3usec for 'x = dict()'.
    DNS : Yeah; function calls are expensive. But that form is more versatile, i.e. dict(zip(...
    vartec : exactly. function calls are some 80-100 times more expensive in Python, than in C.
    DrFalk3n : taking exact use from the doc:http://docs.python.org/library/timeit.html import timeit t = timeit.Timer("a = {'a': 1, 'b': 2}") t.timeit() t = timeit.Timer("a = dict(a = 1, b = 2)") t.timeit() the second is four times more time consuming
  • Basically, {} is syntax and is handled on a language and bytecode level. dict() is just another builtin with a more flexible initialization syntax. Note that dict() was only added in the middle of 2.x series.

  • Update: thanks for the responses. Removed speculation about copy-on-write.

    One other difference between {} and dict is that dict always allocates a new dictionary (even if the contents are static) whereas {} doesn't always do so (see mgood's answer for when and why):

    def dict1():
        return {'a':'b'}
    
    def dict2():
        return dict(a='b')
    
    print id(dict1()), id(dict1())
    print id(dict2()), id(dict2())
    

    produces:

    $ ./mumble.py
    11642752 11642752
    11867168 11867456
    

    I'm not suggesting you try to take advantage of this or not, it depends on the particular situation, just pointing it out. (It's also probably evident from the disassembly if you understand the opcodes).

    Brian : That's not what is happening here. {} is still allocating a new dict (otherwise it would be very bad), but the fact that you don't keep it alive means the id can be reused after it dies (as soon as the first dict is printed).
    Brian : I suspect the function call is acting differently because it is churning the id() space a bit more so a different id is obtained on the second call.
    Jacob Gabrielson : I think mgood's answer clarified things; I've updated my entry.
  • @Jacob: There is a difference in how the objects are allocated, but they are not copy-on-write. Python allocates a fixed-size "free list" where it can quickly allocate dictionary objects (until it fills). Dictionaries allocated via the {} syntax (or a C call to PyDict_New) can come from this free list. When the dictionary is no longer referenced it gets returned to the free list and that memory block can be reused (though the fields are reset first).

    This first dictionary gets immediately returned to the free list, and the next will reuse its memory space:

    >>> id({})
    340160
    >>> id({1: 2})
    340160
    

    If you keep a reference, the next dictionary will come from the next free slot:

    >>> x = {}
    >>> id(x)
    340160
    >>> id({})
    340016
    

    But we can delete the reference to that dictionary and free its slot again:

    >>> del x
    >>> id({})
    340160
    

    Since the {} syntax is handled in byte-code it can use this optimization mentioned above. On the other hand dict() is handled like a regular class constructor and Python uses the generic memory allocator, which does not follow an easily predictable pattern like the free list above.

    Also, looking at compile.c from Python 2.6, with the {} syntax it seems to pre-size the hashtable based on the number of items it's storing which is known at parse time.

Web crawlers and Google App Engine Hosted applications

Is it impossible to run a web crawler on GAE along side with my app considering the I am running the free startup version?

From stackoverflow
  • I suppose you can (i.e., it's not impossible to) run it, but it will be slow and you'll run into limits quite quickly. As CPU quotas are going to be decreased at the end of May even further, I'd recommend against it.

  • App Engine code only runs in response to HTTP requests, so you can't run a persistent crawler in the background. With the upcoming release of scheduled tasks, you could write a crawler that uses that functionality, but it would be less than ideal.

  • It's possible. But that's not really an application for appengine just as Arachnid wrote. If you manage to get it working I'll doubt you'll stay in the qotas for free accounts.

  • While Google hadn't exposed scheduling, queue and background tasks API, you can do any processing only as an answer to external HTTP request. You'd need some heartbeat service that will process one item from crawler's queue at a time (not to hit GAE limits).

    To do crawling from GAE, you have to split your application into queue (that stores queue data in Datastore), queue processor that will react to external HTTP heartbeat and your actual crawling logic.

    You'd manually have to watch your quota usage and start heartbeat when you have spare quota, and stop if it is used up.

    When Google introduces the APIs I've told in the beginning you'd have to rewrite parts that are implemented more effectively via Google API.

    UPDATE: Google introduced Task Queue API some time ago. See task queue docs for python and java.

Class Diagram for classless PHP application

Hello all,

I have PHP application that is completely procedural (No PHP Classes). In my case, is there any formal diagrams I can draw to show how my web application that is equivalent to a UML class diagram for OOP based applications.

Thanks all

From stackoverflow
  • You could make some Hatley-Pirbhai models:

    http://en.wikipedia.org/wiki/Hatley-Pirbhai_modeling

  • doxygen can generate call- and caller graphs automatically - if that suits your needs.

  • It's not because the implementation (or even the design) is procedural, that the business model is.

    Though you may not find explicit reference to classes in your code, they may still be there, hidden, scattered through global variables etc, functions manipulating some globals, ...

    So in order to describe your app, you may still be able to use a (conceptual) class diagram.

Practical implications of OSI vs TCP/IP networking.

Hi.

I'm supposed to be setting up a 'geolocation based', ipv6, wireless mesh network to run on google android.

I found what seems to be a good app to support the meshing: http://www.open-mesh.net/wiki/batman-adv

"Batman-advanced is a new approach to wireless networking which does no longer operate on the IP basis. Unlike B.A.T.M.A.N, which exchanges information using UDP packets and sets routing tables, batman-advanced operates on ISO/OSI Layer 2 only and uses and routes (or better: bridges) Ethernet Frames. It emulates a virtual network switch of all nodes participating. Therefore all nodes appear to be link local, thus all higher operating protocols won't be affected by any changes within the network. You can run almost any protocol above B.A.T.M.A.N. Advanced, prominent examples are: IPv4, IPv6, DHCP, IPX."

But other members in my team has said it's a no-go because it operates on OSI, rather than TCP/IP. This was the first I'd heard of OSI, and I'm wondering how much of a problem this is? What are the implications for mesh network apps that can be developed on top of it? Considering the android is relatively new, we don't need to worry too much about compatibility with existing apps, so does it matter?

I haven't spent a lot of time working with networks, so please put in noobmans terms.

From stackoverflow
  • "ISO/OSI Layer 2" does not mean the OSI protocols. It refers to the "Seven Layer" model of network stacks. It means the Data Link layer.

    The layers are: Physical, Data Link, Network, Transport, Session, Presentation, Application.

  • OSI is a model not a protocol like IP and TCP. What your team seem to be saying is that the mesh won't be using IP. I suspect they are wrong as the text you have quoted states the BATMAN protocol is capable of supporting IP & IPv6 and if that is the case you'd need a very strong reason to use anything else.

  • They're probably referring to the OSI model, which is a commonly-used way of distinguishing between network layers. I'm not sure it's a useful way of looking at things, but it's taught in every networking course on the planet.

    OSI level 2 is the data link layer, which operates immediately above the actual physical level. Basically, it's in charge of flow control, error detection, and possibly error correction. The data link layer is strictly "single hop". It's only concerned about about point-to-point data transfers, not about multi-hop transfers or routing.

    If they're actually referring the OSI networking protocal itself, run screaming as fast as you can. OSI was notoriously hard to implement, and I've never heard of an actual working installation. See the Wikipedia article for the gory details.

  • "You can run almost any protocol above B.A.T.M.A.N. Advanced, prominent examples are: IPv4, IPv6, DHCP, IPX."

    "But other members in my team has said it's a no-go because it operates on OSI, rather than TCP/IP. "

    The other members in your team are confused by the buzzword-fest in BATMAN.

    The "IP" of TCP/IP is IPv4 (or IPv6). So BATMAN supports TCP/IP directly and completely.

    There's no conflict of any kind. Just confusion.

  • The OSI model and the OSI protocols are different.

    The OSI model is a way of breaking things down: physical, link, network, transport, session, presentation, application. OSI protocols are protocol implementations that map directly to those layers in the model.

    The model is a way of looking at things. It mostly makes sense, but it breaks down at the higher levels. For example: what does a presentation layer really do?

    During the '90s, OSI was (in some circles) thought to be the future, but was actually the downfall of some companies, and wasted the resources of many others. For example, DECnet Phase V was Digital's insanely complex implementation of an OSI stack that met government OSI requirements, but was run over by the TCP/IP steamroller.

    The test is: What are the bytes on the wire? In this case it is UDP over IP, not the OSI equivalent, which was CLNP.

    Having said all that, if it is a layer two protocol, it will probably have scalability problems because it is a layer two protocol. Fine for a small number of nodes, but if you're trying to get scale, you need a better solution.

msvcp80d.dll not found while using TBB

I am using Intel TBB C++ for multithreading an application on visual studio 2008. When I run the executable I get a dialog saying "MSVCP80D.dll" was not found. There is so much on the net about this that it confuses me.

Please help.

EDIT: Based on answers, finally I was able to fix the "dll missing" problem. I had given a path to TBB lib of vc8 leading to dependency on vc8 dlls, which are used with visual studio 2005, not with 2008. (Using depends (http://www.dependencywalker.com/ ) it is easy to determine the run-time dependencies of an executable.) I changed by project to depend on vc9 dlls, not vc8 and then it worked fine.

Another thing to note is use of manifest files on windows. Manifest files describe dependencies. The manifest files must be generated while writing an application as it is necessary.

From stackoverflow
  • Your app is compiled with debug version. Debug version of VC runtime is not in path. Try to generate release version.

    Amit Kumar : I am running the app from dev environment. I want to run the debug version.
  • Are you running the program on your development machine? If you are not, you might get this error. The "D" at the end of the filename means that the DLL is a debug DLL, and often not on computers without Visual Studio installed. You're not supposed to redistribute it (copy it around), either. You should compile a "release" version of your application and run that. If you really can't do that for some reason, and it's only one or two computers, then try installing the express version of visual studio on that computer.

    If you are having this problem on your development machine, it can apparently be caused by a compiler/linker problem. Try doing a clean build ("clean", then "build" in Visual Studio).

    Amit Kumar : I am running from dev environment. I tried clean build too. Still the problem.
  • Ok, after a lot of search, and by chance, I landed on this forum http://www.codeguru.com/forum/showthread.php?t=446789 which says something I interpret as "the version of TBB I am using does not support VS 2008".

    But this still uncertain.

  • MSVC80D is VS 2005. As part of VS2008 you would have MSVC90D instead.

  • You can find them online at various places. Just scan it for a virus and put it in your program's path and everything should work fine. You may need more than one of the debug dlls, you can use depends32.exe to see what you are missing.

    Amit Kumar : Thanks, depends22.exe http://www.dependencywalker.com/ was quite useful.
    MSalters : The debug DLLs may not be distributed legally.
    Steve : That is true, but he didn't ask about legality, just how to get it working :D